Trusty Deployment w/ RAID storage config fails because Trusty images do not contain RAID kernel modules

Bug #1519470 reported by Jeff Lane 
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Critical
Unassigned
maas-images
Fix Released
Medium
Scott Moser

Bug Description

I have a server with 2x 2TB SATA disks that I am attempting to deploy with a software RAID 1 scheme via MAAS 1.9 RC2 installed this morning from the maas next ppa.

maas:
  Installed: 1.9.0~rc2+bzr4509-0ubuntu1~trusty1
  Candidate: 1.9.0~rc2+bzr4509-0ubuntu1~trusty1
  Version table:
 *** 1.9.0~rc2+bzr4509-0ubuntu1~trusty1 0
        500 http://ppa.launchpad.net/maas/next/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

To create this schema, I deleted all LVM schema stuff from the MAAS Storage config area for my node.

Next, I selected the two disks (sda, sdb) and clicked Create RAID

I set a RAID 1 partitioned as ext4 and mounted as root.

I then attempted to deploy 15.10 from the Release stream using this schema.

This failed for some reason. It always fails when I attempt to create a RAID scheme. If I use the default LVM scheme, I can deploy this server with 15.10.

However, when I try a RAID scheme, it lays down the filesystem and the last message on the console is about generating ssh keys followed by:

cloud-init[1153]: Cloud-init v. 0.7.7 finished at Tue, 24 Nov 2015 20:12:19 +0000. Datasource DataSourceMAAS [http://10.0.0.1/MAAS/metadata/curtin]. Up 361.02 seconds

I am attaching the full dump of /var/log from the node in a failed state.

Related branches

Revision history for this message
Jeff Lane  (bladernr) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

OK, next I set the default schema to flat and re-commissioned and did a deployment. That too was successful. So after testing that, I "unmounted" and removed the formatting for sda-part1.

I created sdb-part1 and then selected both sda-part1 and sdb-part1 and clicked "Create RAID".

Then I created a RAID0 using those partititons, formatted as ext4 and mounted at /

That left me with this:

Used disks and partitions

Name|Model|Serial Boot Device type Used for
md0 RAID 0 ext4 formatted filesystem mounted at /
sda Physical MBR partitioned with 1 partition
sda-part1 Partition Active raid-0 device for md0
sdb Physical GPT partitioned with 1 partition
sdb-part1 Partition Active raid-0 device for md0

Not sure why it opts for GPT on the one I just created, but MBR when it automatically cretaes a flat system, that seems discongruous.

Revision history for this message
Jeff Lane  (bladernr) wrote :

That second/fourth attempt also failed. So so far, the ONLY way I can successfully deploy is using either the default LVM schema or the default Flat schema.

I have not attempted a more complex flat or LVM custom scheme yet, I really just wanted to set up RAID.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Please provide the installation log from the MAAS UI for the deploying node.

Also if you could provide the output of "maas <session> node read <system_id>" for the node that would be helpful as well.

Thanks,
Blake

Changed in maas:
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (14.9 KiB)

Node status event - 'cloudinit' running modules for final Wed, 25 Nov. 2015 14:44:27
Node status event - 'cloudinit' config-power-state-change ran successfully Wed, 25 Nov. 2015 14:44:27
Node status event - 'cloudinit' running config-power-state-change with frequency once-per-instance Wed, 25 Nov. 2015 14:44:27
Node status event - 'cloudinit' config-final-message ran successfully Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' running config-final-message with frequency always Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' config-phone-home ran successfully Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' running config-phone-home with frequency once-per-instance Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' config-keys-to-console ran successfully Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' running config-keys-to-console with frequency once-per-instance Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' config-ssh-authkey-fingerprints ran successfully Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' running config-ssh-authkey-fingerprints with frequency once-per-instance Wed, 25 Nov. 2015 14:44:26
Node status event - 'cloudinit' running config-scripts-user with frequency once-per-instance Wed, 25 Nov. 2015 14:44:26
Node changed status - From 'Deploying' to 'Failed deployment' Wed, 25 Nov. 2015 14:44:26
Node installation - 'curtin' curtin command install Wed, 25 Nov. 2015 14:44:26
Node installation - 'curtin' configuring installed system Wed, 25 Nov. 2015 14:44:24
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:44:24
Node installation - 'curtin' curtin command curthooks Wed, 25 Nov. 2015 14:44:24
Node installation - 'curtin' curtin command curthooks Wed, 25 Nov. 2015 14:43:33
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:43:32
Node installation - 'curtin' configuring installed system Wed, 25 Nov. 2015 14:43:32
Node installation - 'curtin' writing install sources to disk Wed, 25 Nov. 2015 14:43:32
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:43:32
Node installation - 'curtin' curtin command extract Wed, 25 Nov. 2015 14:43:32
Node installation - 'curtin' curtin command extract Wed, 25 Nov. 2015 14:43:18
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:43:18
Node installation - 'curtin' writing install sources to disk Wed, 25 Nov. 2015 14:43:18
Node installation - 'curtin' configuring network Wed, 25 Nov. 2015 14:43:18
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:43:18
Node installation - 'curtin' curtin command net-meta Wed, 25 Nov. 2015 14:43:17
Node installation - 'curtin' curtin command net-meta Wed, 25 Nov. 2015 14:43:17
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:43:17
Node installation - 'curtin' configuring network Wed, 25 Nov. 2015 14:43:17
Node installation - 'curtin' configuring storage Wed, 25 Nov. 2015 14:43:17
Node installation - 'curtin' running 'builtin' Wed, 25 Nov. 2015 14:43:17
Node installation - 'curtin' curtin command block-meta Wed, 25 Nov. 2015 14:43:16
Node installation - 'curtin' curtin command block-meta Wed, 25 N...

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (14.1 KiB)

From maas <session> nodes list

 {
        "hwe_kernel": "hwe-w",
        "ip_addresses": [
            "10.0.0.128"
        ],
        "cpu_count": 4,
        "power_type": "ipmi",
        "tag_names": [],
        "swap_size": null,
        "owner": "bladernr",
        "macaddress_set": [
            {
                "mac_address": "00:30:48:65:5e:0c"
            },
            {
                "mac_address": "00:30:48:65:5e:0d"
            }
        ],
        "zone": {
            "resource_uri": "/MAAS/api/1.0/zones/Rack2/",
            "name": "Rack2",
            "description": "Rack2"
        },
        "hostname": "supermicro.maas",
        "storage": 4000797,
        "substatus_message": "'cloudinit' running modules for final",
        "system_id": "node-28402790-92c5-11e5-a8a3-eca86bfb9f66",
        "boot_type": "fastpath",
        "memory": 4096,
        "substatus_action": "modules-final",
        "disable_ipv4": false,
        "architecture": "amd64/generic",
        "status": 6,
        "power_state": "on",
        "substatus_name": "Failed deployment",
        "routers": [
            "5c:f4:ab:f8:5b:a4",
            "5c:f4:ab:f8:5b:a4"
        ],
        "physicalblockdevice_set": [
            {
                "size": 2000398934016,
                "resource_uri": "/MAAS/api/1.0/nodes/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/blockdevices/5/",
                "uuid": null,
                "name": "sda",
                "tags": [
                    "rotary",
                    "sata",
                    "7200rpm"
                ],
                "type": "physical",
                "id": 5,
                "used_for": "MBR partitioned with 1 partition",
                "partition_table_type": "MBR",
                "filesystem": null,
                "id_path": "/dev/disk/by-id/wwn-0x50014ee0040d0de7",
                "available_size": 0,
                "path": "/dev/disk/by-dname/sda",
                "block_size": 4096,
                "used_size": 2000398843904,
                "model": "WDC WD2004FBYZ-0",
                "serial": "WD-WMC6N0D4Y76V",
                "partitions": [
                    {
                        "uuid": "22fd433f-13f1-4f47-bcd1-a00f181ec828",
                        "bootable": false,
                        "used_for": "Active raid-0 device for md0",
                        "filesystem": {
                            "mount_point": null,
                            "uuid": "4da2895f-5dd5-4939-8dc0-15159fe927c8",
                            "fstype": "raid",
                            "label": null
                        },
                        "path": "/dev/disk/by-dname/sda-part1",
                        "resource_uri": "/MAAS/api/1.0/nodes/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/blockdevices/5/partition/8",
                        "type": "partition",
                        "id": 8,
                        "size": 2000393601024
                    }
                ]
            },
            {
                "size": 2000398934016,
                "resource_uri": "/MAAS/api/1.0/nodes/node-28402790-92c5-11e5-a8a3...

Revision history for this message
Blake Rouse (blake-rouse) wrote :

You provided the node event log. I need the node installation log which is all the way at the bottom of the page.

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (27.1 KiB)

Here is the log from a successful deployment after re-commissioning and having it reset the storage config to the default flat config:

Node status event - 'cloudinit' running modules for final Wed, 25 Nov. 2015 15:11:09
Node status event - 'cloudinit' config-power-state-change ran successfully Wed, 25 Nov. 2015 15:11:09
Node status event - 'cloudinit' running config-power-state-change with frequency once-per-instance Wed, 25 Nov. 2015 15:11:09
Node status event - 'cloudinit' config-final-message ran successfully Wed, 25 Nov. 2015 15:11:09
Node status event - 'cloudinit' running config-final-message with frequency always Wed, 25 Nov. 2015 15:11:09
Node status event - 'cloudinit' config-phone-home ran successfully Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' running config-phone-home with frequency once-per-instance Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' config-keys-to-console ran successfully Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' running config-keys-to-console with frequency once-per-instance Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' config-ssh-authkey-fingerprints ran successfully Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' running config-ssh-authkey-fingerprints with frequency once-per-instance Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' config-scripts-user ran successfully Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' running config-scripts-user with frequency once-per-instance Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' config-scripts-per-instance ran successfully Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' running config-scripts-per-instance with frequency once-per-instance Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' config-scripts-per-boot ran successfully Wed, 25 Nov. 2015 15:11:08
Node status event - 'cloudinit' running config-scripts-per-boot with frequency always Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' config-scripts-per-once ran successfully Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' running config-scripts-per-once with frequency once Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' config-scripts-vendor ran successfully Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' running config-scripts-vendor with frequency once-per-instance Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' config-rightscale_userdata ran successfully Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' running config-rightscale_userdata with frequency once-per-instance Wed, 25 Nov. 2015 15:11:07
Node status event - 'cloudinit' running modules for config Wed, 25 Nov. 2015 15:11:06
Node status event - 'cloudinit' config-byobu ran successfully Wed, 25 Nov. 2015 15:11:06
Node status event - 'cloudinit' running config-byobu with frequency once-per-instance Wed, 25 Nov. 2015 15:11:06
Node status event - 'cloudinit' config-runcmd ran successfully Wed, 25 Nov. 2015 15:11:06
Node status event - 'cloudinit' running config-runcmd with frequency once-per-instance Wed, 25 Nov. 2015 15:11:06
Node status event - 'cloudinit' config-disable-ec2-metad...

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (6.2 KiB)

{u'hwe_kernel': u'hwe-w', u'ip_addresses': [u'10.0.0.128'], u'cpu_count': 4, u'power_type': u'ipmi', u'tag_names': [], u'swap_size': None, u'owner': u'bladernr', u'macaddress_set': [{u'mac_address': u'00:30:48:65:5e:0d'}, {u'mac_address': u'00:30:48:65:5e:0c'}], u'zone': {u'description': u'Rack2', u'name': u'Rack2', u'resource_uri': u'/MAAS/api/1.0/zones/Rack2/'}, u'hostname': u'supermicro.maas', u'storage': 4000797, u'substatus_message': u"'cloudinit' running modules for final", u'system_id': u'node-28402790-92c5-11e5-a8a3-eca86bfb9f66', u'boot_type': u'fastpath', u'memory': 4096, u'substatus_action': u'modules-final', u'disable_ipv4': False, u'min_hwe_kernel': u'', u'status': 6, u'power_state': u'on', u'substatus_name': u'Deployed', u'routers': [u'5c:f4:ab:f8:5b:a4', u'5c:f4:ab:f8:5b:a4'], u'physicalblockdevice_set': [{u'model': u'WDC WD2004FBYZ-0', u'block_size': 4096, u'available_size': 0, u'uuid': None, u'tags': [u'rotary', u'sata', u'7200rpm'], u'used_size': 2000398843904, u'partitions': [{u'uuid': u'f946670f-ebf1-4482-adf5-3f4960825463', u'bootable': False, u'used_for': u'ext4 formatted filesystem mounted at /', u'filesystem': {u'label': u'root', u'mount_point': u'/', u'uuid': u'c4fd26f4-6953-4c8f-a947-bac3357af5cc', u'fstype': u'ext4'}, u'path': u'/dev/disk/by-dname/sda-part1', u'size': 2000393601024, u'type': u'partition', u'id': 10, u'resource_uri': u'/MAAS/api/1.0/nodes/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/blockdevices/5/partition/10'}], u'name': u'sda', u'partition_table_type': u'MBR', u'filesystem': None, u'id_path': u'/dev/disk/by-id/wwn-0x50014ee0040d0de7', u'used_for': u'MBR partitioned with 1 partition', u'path': u'/dev/disk/by-dname/sda', u'size': 2000398934016, u'type': u'physical', u'id': 5, u'serial': u'WD-WMC6N0D4Y76V', u'resource_uri': u'/MAAS/api/1.0/nodes/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/blockdevices/5/'}, {u'model': u'WDC WD2004FBYZ-0', u'block_size': 4096, u'available_size': 2000398934016, u'uuid': None, u'tags': [u'rotary', u'sata', u'7200rpm'], u'used_size': 0, u'partitions': [], u'name': u'sdb', u'partition_table_type': None, u'filesystem': None, u'id_path': u'/dev/disk/by-id/wwn-0x50014ee0aeb7cbe7', u'used_for': u'Unused', u'path': u'/dev/disk/by-dname/sdb', u'size': 2000398934016, u'type': u'physical', u'id': 6, u'serial': u'WD-WMC6N0D8AS39', u'resource_uri': u'/MAAS/api/1.0/nodes/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/blockdevices/6/'}], u'boot_disk': None, u'pxe_mac': {u'mac_address': u'00:30:48:65:5e:0c'}, u'netboot': False, u'osystem': u'ubuntu', u'substatus': 6, u'virtualblockdevice_set': [], u'architecture': u'amd64/generic', u'interface_set': [{u'name': u'eth1', u'links': [{u'id': 105, u'mode': u'link_up'}], u'tags': [], u'vlan': {u'id': 0, u'fabric': u'fabric-0', u'name': u'untagged', u'vid': 0, u'resource_uri': u'/MAAS/api/1.0/vlans/0/'}, u'enabled': True, u'effective_mtu': 1500, u'children': [], u'discovered': [{u'subnet': {u'dns_servers': [], u'name': u'10.0.0.0/24', u'space': u'space-0', u'vlan': {u'id': 0, u'fabric': u'fabric-0', u'name': u'untagged', u'vid': 0, u'resource_uri': u'/MAAS/api/1.0/vlans/0/'}, u'gateway_ip': u'10.0.0.1', u'cidr': u'10.0.0.0/24', u'id': 1, u'resource_ur...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :

ahhh... crap... ok. give me a few

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (24.6 KiB)

mdadm: No arrays found in config file or automatically
mdadm: No arrays found in config file or automatically
Creating new GPT entries.
The operation has completed successfully.
The operation has completed successfully.
mdadm: Unrecognised md component device - /dev/sda1
mdadm: Unrecognised md component device - /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
--2015-11-25 20:29:31-- http://10.0.0.1:5248/images/ubuntu/amd64/generic/wily/release/root-tgz
Connecting to 10.0.0.1:5248... connected.
HTTP request sent, awaiting response... 200 OK
Length: 389850048 (372M) [text/html]
Saving to: 'STDOUT'

     0K ........ ........ ........ ........ ........ ........ 0% 9.60M 38s
  3072K ........ ........ ........ ........ ........ ........ 1% 33.0M 25s
  6144K ........ ........ ........ ........ ........ ........ 2% 39.0M 19s
  9216K ........ ........ ........ ........ ........ ........ 3% 42.3M 17s
 12288K ........ ........ ........ ........ ........ ........ 4% 33.1M 15s
 15360K ........ ........ ........ ........ ........ ........ 4% 29.5M 15s
 18432K ........ ........ ........ ........ ........ ........ 5% 28.6M 14s
 21504K ........ ........ ........ ........ ........ ........ 6% 30.6M 14s
 24576K ........ ........ ........ ........ ........ ........ 7% 29.3M 13s
 27648K ........ ........ ........ ........ ........ ........ 8% 28.4M 13s
 30720K ........ ........ ........ ........ ........ ........ 8% 30.9M 13s
 33792K ........ ........ ........ ........ ........ ........ 9% 31.5M 13s
 36864K ........ ........ ........ ........ ........ ........ 10% 25.7M 12s
 39936K ........ ........ ........ ........ ........ ........ 11% 29.4M 12s
 43008K ........ ........ ........ ........ ........ ........ 12% 27.9M 12s
 46080K ........ ........ ........ ........ ........ ........ 12% 29.3M 12s
 49152K ........ ........ ........ ........ ........ ........ 13% 30.6M 12s
 52224K ........ ........ ........ ........ ........ ........ 14% 28.9M 12s
 55296K ........ ........ ........ ........ ........ ........ 15% 29.4M 11s
 58368K ........ ........ ........ ........ ........ ........ 16% 31.0M 11s
 61440K ........ ........ ........ ........ ........ ........ 16% 31.4M 11s
 64512K ........ ........ ........ ........ ........ ........ 17% 33.8M 11s
 67584K ........ ........ ........ ........ ........ ........ 18% 27.7M 11s
 70656K ........ ........ ........ ........ ........ ........ 19% 32.0M 11s
 73728K ........ ........ ........ ........ ........ ........ 20% 56.9M 10s
 76800K ........ ........ ........ ........ ........ ........ 20% 40.1M 10s
 79872K ........ ........ ........ ........ ........ ........ 21% 39.6M 10s
 82944K ........ ........ ........ ........ ........ ........ 22% 40.0M 10s
 86016K ........ ........ ........ ........ ........ ........ 23% 37.5M 10s
 89088K ........ ........ ........ ........ ........ ........ 24% 38.7M 9s
 92160K ........ ........ ........ ........ ........ ........ 25% 46.2M 9s
 95232K ........ ........ ........ ........ ........ ........ 25% 50.7M 9s
 98304K ........ ........ ........ ........ ........ ........ 26% 29.9M 9s
101376K ........ ........ ........ ........ ........ ......

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (17.3 KiB)

This is on the same machine, immediately following the failure above with a recommission and using the default flat scheme:

mdadm: /dev/md/0 has been started with 2 drives.
mdadm: stopped /dev/md0
mdadm: error opening /dev/md0: No such file or directory
mdadm: stopped /dev/md0
mdadm: error opening /dev/md0: No such file or directory
--2015-11-25 20:41:27-- http://10.0.0.1:5248/images/ubuntu/amd64/generic/wily/release/root-tgz
Connecting to 10.0.0.1:5248... connected.
HTTP request sent, awaiting response... 200 OK
Length: 389850048 (372M) [text/html]
Saving to: 'STDOUT'

     0K ........ ........ ........ ........ ........ ........ 0% 11.1M 33s
  3072K ........ ........ ........ ........ ........ ........ 1% 30.5M 22s
  6144K ........ ........ ........ ........ ........ ........ 2% 36.4M 18s
  9216K ........ ........ ........ ........ ........ ........ 3% 41.9M 16s
 12288K ........ ........ ........ ........ ........ ........ 4% 33.0M 15s
 15360K ........ ........ ........ ........ ........ ........ 4% 32.3M 14s
 18432K ........ ........ ........ ........ ........ ........ 5% 29.1M 14s
 21504K ........ ........ ........ ........ ........ ........ 6% 29.9M 13s
 24576K ........ ........ ........ ........ ........ ........ 7% 30.7M 13s
 27648K ........ ........ ........ ........ ........ ........ 8% 30.1M 13s
 30720K ........ ........ ........ ........ ........ ........ 8% 34.9M 12s
 33792K ........ ........ ........ ........ ........ ........ 9% 31.0M 12s
 36864K ........ ........ ........ ........ ........ ........ 10% 29.4M 12s
 39936K ........ ........ ........ ........ ........ ........ 11% 29.1M 12s
 43008K ........ ........ ........ ........ ........ ........ 12% 29.0M 12s
 46080K ........ ........ ........ ........ ........ ........ 12% 29.0M 11s
 49152K ........ ........ ........ ........ ........ ........ 13% 26.4M 11s
 52224K ........ ........ ........ ........ ........ ........ 14% 29.5M 11s
 55296K ........ ........ ........ ........ ........ ........ 15% 29.9M 11s
 58368K ........ ........ ........ ........ ........ ........ 16% 30.8M 11s
 61440K ........ ........ ........ ........ ........ ........ 16% 31.9M 11s
 64512K ........ ........ ........ ........ ........ ........ 17% 33.0M 11s
 67584K ........ ........ ........ ........ ........ ........ 18% 25.1M 11s
 70656K ........ ........ ........ ........ ........ ........ 19% 30.5M 10s
 73728K ........ ........ ........ ........ ........ ........ 20% 47.5M 10s
 76800K ........ ........ ........ ........ ........ ........ 20% 45.4M 10s
 79872K ........ ........ ........ ........ ........ ........ 21% 40.6M 10s
 82944K ........ ........ ........ ........ ........ ........ 22% 40.3M 10s
 86016K ........ ........ ........ ........ ........ ........ 23% 36.1M 9s
 89088K ........ ........ ........ ........ ........ ........ 24% 35.8M 9s
 92160K ........ ........ ........ ........ ........ ........ 25% 43.5M 9s
 95232K ........ ........ ........ ........ ........ ........ 25% 44.9M 9s
 98304K ........ ........ ........ ........ ........ ........ 26% 48.5M 9s
101376K ........ ........ ........ ........ ........ ........ 27% 39.8M 9s
104448K ........ ........ ........ ...........

Changed in maas:
status: Incomplete → New
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Looks to be an issue with curtin and apt-get installing grub and chocking on /dev/md0.

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
Changed in curtin:
importance: Undecided → Critical
Changed in maas:
milestone: none → 1.9.0
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Since this is a curtin issue could you please provide the curtin config for the node. Once the node has failed the deployment and is in failed state please provide the output of the following:

maas <session> node get-curtin-config <system-id>

Also have you tried other Ubuntu releases or only wily?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've successfully run mdadm with 15.10 on raid-10, raid-5 and raid-6 as / devices - I doubt raid-1 should be any different.

Blake Rouse - can you guide Jeff to provide a full install log with "-vvv" as we currently use it in the other bug we work on together?
That would be great.

That together with the config Jeff already provided in comment #6 should help me recreating the issue to debug it.

Changed in curtin:
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I checked it once more, we even had raid level 1 tested all the time.
I quickly also added Trusty just to be sure, but T/V/W are all working in my case - so we have to find the difference of your definition.

I relaized the config listed above in comment #6 is a bit short, didn't see that this morning.
Blake - could you also guide him to get and upload a yaml config of his case?

As a reference here my test yaml http://paste.ubuntu.com/13514162/

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (3.7 KiB)

Blake:

Machine-readable output follows:
apt_mirrors:
  ubuntu_archive: http://us.archive.ubuntu.com//ubuntu/
  ubuntu_security: http://us.archive.ubuntu.com//ubuntu/
apt_proxy: http://10.0.0.1:8000/
debconf_selections:
  maas: 'cloud-init cloud-init/datasources multiselect MAAS

    cloud-init cloud-init/maas-metadata-url string http://10.0.0.1/MAAS/metadata/

    cloud-init cloud-init/maas-metadata-credentials string oauth_token_key=GBLKfyTX7EynSfXHzf&oauth_token_secret=NGzpKe8ynYqDB9NkS3E4XWV28N5JAFMP&oauth_consumer_key=NfUA6K5QhdM3uuwkGg

    cloud-init cloud-init/local-cloud-config string apt_preserve_sources_list:
    true\napt_proxy: http://10.0.0.1:8000/\nmanage_etc_hosts: false\nmanual_cache_clean:
    true\nreporting:\n maas: {consumer_key: NfUA6K5QhdM3uuwkGg, endpoint: ''http://10.0.0.1/MAAS/metadata/status/node-28402790-92c5-11e5-a8a3-eca86bfb9f66'',\n token_key:
    GBLKfyTX7EynSfXHzf, token_secret: NGzpKe8ynYqDB9NkS3E4XWV28N5JAFMP,\n type:
    webhook}\nsystem_info:\n package_mirrors:\n - arches: [i386, amd64]\n failsafe:
    {primary: ''http://archive.ubuntu.com/ubuntu'', security: ''http://security.ubuntu.com/ubuntu''}\n search:\n primary:
    [''http://us.archive.ubuntu.com/ubuntu/'']\n security: [''http://us.archive.ubuntu.com/ubuntu/'']\n -
    arches: [default]\n failsafe: {primary: ''http://ports.ubuntu.com/ubuntu-ports'',
    security: ''http://ports.ubuntu.com/ubuntu-ports''}\n search:\n primary:
    [''http://ports.ubuntu.com/ubuntu-ports'']\n security: [''http://ports.ubuntu.com/ubuntu-ports'']\n

    '
install:
  log_file: /tmp/install.log
  post_files:
  - /tmp/install.log
kernel:
  mapping: {}
  package: linux-generic
late_commands:
  maas:
  - wget
  - --no-proxy
  - http://10.0.0.1/MAAS/metadata/latest/by-id/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/
  - --post-data
  - op=netboot_off
  - -O
  - /dev/null
network:
  config:
  - id: eth0
    mac_address: 00:30:48:65:5e:0c
    mtu: 1500
    name: eth0
    subnets:
    - address: 10.0.0.128/24
      dns_nameservers: []
      gateway: 10.0.0.1
      type: static
    type: physical
  - id: eth1
    mac_address: 00:30:48:65:5e:0d
    mtu: 1500
    name: eth1
    subnets:
    - type: manual
    type: physical
  - address: 10.0.0.1
    search:
    - maas
    type: nameserver
  version: 1
network_commands:
  builtin:
  - curtin
  - net-meta
  - custom
partitioning_commands:
  builtin:
  - curtin
  - block-meta
  - custom
power_state:
  mode: reboot
reporting:
  maas:
    consumer_key: NfUA6K5QhdM3uuwkGg
    endpoint: http://10.0.0.1/MAAS/metadata/status/node-28402790-92c5-11e5-a8a3-eca86bfb9f66
    token_key: GBLKfyTX7EynSfXHzf
    token_secret: NGzpKe8ynYqDB9NkS3E4XWV28N5JAFMP
    type: webhook
storage:
  config:
  - grub_device: true
    id: sda
    model: WDC WD2004FBYZ-0
    name: sda
    ptable: msdos
    serial: WD-WMC6N0D4Y76V
    type: disk
    wipe: superblock
  - id: sdb
    model: WDC WD2004FBYZ-0
    name: sdb
    ptable: gpt
    serial: WD-WMC6N0D8AS39
    type: disk
    wipe: superblock
  - device: sda
    id: sda-part1
    name: sda-part1
    number: 1
    offset: 4194304B
    size: 2000393601024B
    ty...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :

Blake, I've now tried with Trusty, Vivid and Wily from Releases using the exact same config, and it fails on every one.

Revision history for this message
Jeff Lane  (bladernr) wrote :

If you could tell me how to get logs using -vvv (Christian's suggestion) I'll do that and post them to the bug. I'm also going to re-try with raid0.

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (3.8 KiB)

Also tried RAID0 with wily and that too failed.
Machine-readable output follows:
apt_mirrors:
  ubuntu_archive: http://us.archive.ubuntu.com//ubuntu/
  ubuntu_security: http://us.archive.ubuntu.com//ubuntu/
apt_proxy: http://10.0.0.1:8000/
debconf_selections:
  maas: 'cloud-init cloud-init/datasources multiselect MAAS

    cloud-init cloud-init/maas-metadata-url string http://10.0.0.1/MAAS/metadata/

    cloud-init cloud-init/maas-metadata-credentials string oauth_token_key=BXbzu9PEvBBb4mDdLU&oauth_token_secret=8SquwnRT8ptAaJcefvCTpYaRVYW7JZaP&oauth_consumer_key=fRkhnNk3kzQgUQvJ7s

    cloud-init cloud-init/local-cloud-config string apt_preserve_sources_list:
    true\napt_proxy: http://10.0.0.1:8000/\nmanage_etc_hosts: false\nmanual_cache_clean:
    true\nreporting:\n maas: {consumer_key: fRkhnNk3kzQgUQvJ7s, endpoint: ''http://10.0.0.1/MAAS/metadata/status/node-28402790-92c5-11e5-a8a3-eca86bfb9f66'',\n token_key:
    BXbzu9PEvBBb4mDdLU, token_secret: 8SquwnRT8ptAaJcefvCTpYaRVYW7JZaP,\n type:
    webhook}\nsystem_info:\n package_mirrors:\n - arches: [i386, amd64]\n failsafe:
    {primary: ''http://archive.ubuntu.com/ubuntu'', security: ''http://security.ubuntu.com/ubuntu''}\n search:\n primary:
    [''http://us.archive.ubuntu.com/ubuntu/'']\n security: [''http://us.archive.ubuntu.com/ubuntu/'']\n -
    arches: [default]\n failsafe: {primary: ''http://ports.ubuntu.com/ubuntu-ports'',
    security: ''http://ports.ubuntu.com/ubuntu-ports''}\n search:\n primary:
    [''http://ports.ubuntu.com/ubuntu-ports'']\n security: [''http://ports.ubuntu.com/ubuntu-ports'']\n

    '
install:
  log_file: /tmp/install.log
  post_files:
  - /tmp/install.log
kernel:
  mapping: {}
  package: linux-generic
late_commands:
  maas:
  - wget
  - --no-proxy
  - http://10.0.0.1/MAAS/metadata/latest/by-id/node-28402790-92c5-11e5-a8a3-eca86bfb9f66/
  - --post-data
  - op=netboot_off
  - -O
  - /dev/null
network:
  config:
  - id: eth0
    mac_address: 00:30:48:65:5e:0c
    mtu: 1500
    name: eth0
    subnets:
    - address: 10.0.0.128/24
      dns_nameservers: []
      gateway: 10.0.0.1
      type: static
    type: physical
  - id: eth1
    mac_address: 00:30:48:65:5e:0d
    mtu: 1500
    name: eth1
    subnets:
    - type: manual
    type: physical
  - address: 10.0.0.1
    search:
    - maas
    type: nameserver
  version: 1
network_commands:
  builtin:
  - curtin
  - net-meta
  - custom
partitioning_commands:
  builtin:
  - curtin
  - block-meta
  - custom
power_state:
  mode: reboot
reporting:
  maas:
    consumer_key: fRkhnNk3kzQgUQvJ7s
    endpoint: http://10.0.0.1/MAAS/metadata/status/node-28402790-92c5-11e5-a8a3-eca86bfb9f66
    token_key: BXbzu9PEvBBb4mDdLU
    token_secret: 8SquwnRT8ptAaJcefvCTpYaRVYW7JZaP
    type: webhook
storage:
  config:
  - grub_device: true
    id: sda
    model: WDC WD2004FBYZ-0
    name: sda
    ptable: msdos
    serial: WD-WMC6N0D4Y76V
    type: disk
    wipe: superblock
  - id: sdb
    model: WDC WD2004FBYZ-0
    name: sdb
    ptable: gpt
    serial: WD-WMC6N0D8AS39
    type: disk
    wipe: superblock
  - device: sda
    id: sda-part1
    name: sda-part1
    number: 1
    offset: ...

Read more...

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Jeff,

maas admin maas set-config name=curtin_verbose value=True

Will enable verbose output. You will see the output of the installation log contains a lot more information.

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (40.2 KiB)

Here's the log from the UI from another failed deployment after turning on verbose per your comment #21

Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=False)
mdadm: /dev/md/0 has been started with 2 drives.
clear_holders running on '/sys/block/sda/sda1', with holders '['md0']'
clear_holders running on '/sys/devices/virtual/block/md0', with holders '[]'
stopping: /dev/md0
Running command ['mdadm', '--stop', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: stopped /dev/md0
Running command ['mdadm', '--remove', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: error opening /dev/md0: No such file or directory
Running command ['sgdisk', '--zap-all', '/dev/sda1'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
clear_holders running on '/sys/block/sda', with holders '[]'
Running command ['sgdisk', '--zap-all', '/dev/sda'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
labeling device: '/dev/sda' with 'msdos' partition table
Running command ['parted', '/dev/sda', '--script', 'mklabel', 'msdos'] with allowed return codes [0] (shell=False, capture=False)
Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['blkid', '-o', 'export', '/dev/sda'] with allowed return codes [0, 2] (shell=False, capture=True)
Running command ['partprobe', '/dev/sdb'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=False)
clear_holders running on '/sys/block/sdb/sdb1', with holders '['md0']'
clear_holders running on '/sys/devices/virtual/block/md0', with holders '[]'
stopping: /dev/md0
Running command ['mdadm', '--stop', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: stopped /dev/md0
Running command ['mdadm', '--remove', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: error opening /dev/md0: No such file or directory
Running command ['sgdisk', '--zap-all', '/dev/sdb1'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
clear_holders running on '/sys/block/sdb', with holders '[]'
Running command ['sgdisk', '--zap-all', '/dev/sdb'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
labeling device: '/dev/sdb' with 'gpt' partition table
Running command ['sgdisk', '--clear', '/dev/sdb'] with allowed return codes [0] (shell=False, capture=False)
Creating new GPT entries.
The operation has completed successfully.
Running command ['partprobe', '/dev/sdb'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle...

Jeff Lane  (bladernr)
summary: - Deployment always fails when creating a RAID storage config
+ Deployment always fails when creating a custom storage config
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: Deployment always fails when creating a custom storage config

Ok, New data point... I got another failure with a customized flat config. The following is the current custom flat config I've just attempted:

File systems
Name Size Mountpoint File system
sda-part1 2.0 TB /windows vfat
sdb-part1 2.0 TB / ext4

Used disks and partitions
Name|Model|Serial Boot Device type Used for
sda Physical MBR partitioned with 1 partition
sda-part1 Partition vfat formatted filesystem mounted at /windows
sdbPhysical GPT partitioned with 1 partition
sdb-part1 Partition ext4 formatted filesystem mounted at /

There was no install log in the web UI, unfortunately, I will try to recreate this and see if I can get one to appear. In the meanwhile, I have added the contents of /var/log from this new failure case

Revision history for this message
Jeff Lane  (bladernr) wrote :

I tried three more times but got no install info in the web UI from this fail case :(

Revision history for this message
Andres Rodriguez (andreserl) wrote :

I wonder if the reason is related to creating partitions so large?

sda-part1 2.0 TB /windows vfat
sdb-part1 2.0 TB / ext4

?

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Or using multiple disks?

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (7.9 KiB)

Tried again with the latest 1.9 bits from maas/proposed.

Attempted a RAID0 once more after zeroing the two disks manually, then attempting to deploy. This is the install log from the maas UI:

Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Error: /dev/sda: unrecognised disk label
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=False)
mdadm: No arrays found in config file or automatically
clear_holders running on '/sys/block/sda', with holders '[]'
Running command ['sgdisk', '--zap-all', '/dev/sda'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
labeling device: '/dev/sda' with 'msdos' partition table
Running command ['parted', '/dev/sda', '--script', 'mklabel', 'msdos'] with allowed return codes [0] (shell=False, capture=False)
Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['blkid', '-o', 'export', '/dev/sda'] with allowed return codes [0, 2] (shell=False, capture=True)
Can't find a uuid for volume: sda. Skipping dname.
Running command ['partprobe', '/dev/sdb'] with allowed return codes [0, 1] (shell=False, capture=False)
Error: /dev/sdb: unrecognised disk label
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=False)
mdadm: No arrays found in config file or automatically
clear_holders running on '/sys/block/sdb', with holders '[]'
Running command ['sgdisk', '--zap-all', '/dev/sdb'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
labeling device: '/dev/sdb' with 'gpt' partition table
Running command ['sgdisk', '--clear', '/dev/sdb'] with allowed return codes [0] (shell=False, capture=False)
Creating new GPT entries.
The operation has completed successfully.
Running command ['partprobe', '/dev/sdb'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['blkid', '-o', 'export', '/dev/sdb'] with allowed return codes [0, 2] (shell=False, capture=True)
Can't find a uuid for volume: sdb. Skipping dname.
Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
adding partition 'sda-part1' to disk 'sda'
Running command ['parted', '/dev/sda', '--script', 'mkpart', 'primary', '2048s', '3907020799s'] with allowed return codes [0] (shell=False, capture=False)
Running command ['partprobe', '/dev/sda'] with allowed return codes ...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :

bah, truncated, here it is from the traceback down:

Traceback (most recent call last):
  File "/curtin/curtin/commands/main.py", line 208, in main
    ret = args.func(args)
  File "curtin/commands/block_meta.py", line 63, in block_meta
    meta_custom(args)
  File "curtin/commands/block_meta.py", line 1106, in meta_custom
    handler(command, storage_config_dict)
  File "curtin/commands/block_meta.py", line 937, in raid_handler
    util.subp(" ".join(cmd), shell=True)
  File "curtin/util.py", line 99, in subp
    return _subp(*args, **kwargs)
  File "curtin/util.py", line 70, in _subp
    cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: mdadm --create /dev/md0 --run --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Unexpected error while running command.
Command: mdadm --create /dev/md0 --run --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -

Revision history for this message
Jeff Lane  (bladernr) wrote :

I then ssh'd to the node and attempted to manually build the array using the same command curtin was using: (Note I added -v to see if I could get some additional info from the failure)

ubuntu@supermicro:~$ sudo mdadm -v -v --create /dev/md0 --run --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1
sudo: unable to resolve host supermicro
mdadm: chunk size defaults to 512K
mdadm: /dev/sda1 appears to be part of a raid array:
    level=raid0 devices=2 ctime=Thu Jan 14 19:29:50 2016
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid0 devices=2 ctime=Thu Jan 14 19:29:50 2016
mdadm: creation continuing despite oddities due to --run
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Invalid argument

Revision history for this message
Jeff Lane  (bladernr) wrote :

Interestingly, I then shut the node down and removed the RAID0 scheme and replaced it with a custom flat scheme. This time, not only was the deployment successful, BUT it appears that the installer mounted my OLD devices as a software RAID0 before clearing them:

Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=False)
mdadm: failed to RUN_ARRAY /dev/md/0: Invalid argument
clear_holders running on '/sys/block/sda/sda1', with holders '['md0']'
clear_holders running on '/sys/devices/virtual/block/md0', with holders '[]'
stopping: /dev/md0
Running command ['mdadm', '--stop', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: stopped /dev/md0
Running command ['mdadm', '--remove', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: error opening /dev/md0: No such file or directory
Running command ['sgdisk', '--zap-all', '/dev/sda1'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
clear_holders running on '/sys/block/sda', with holders '[]'
Running command ['sgdisk', '--zap-all', '/dev/sda'] with allowed return codes [0, 1, 2, 5] (shell=False, capture=True)
labeling device: '/dev/sda' with 'msdos' partition table
Running command ['parted', '/dev/sda', '--script', 'mklabel', 'msdos'] with allowed return codes [0] (shell=False, capture=False)
Running command ['partprobe', '/dev/sda'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['blkid', '-o', 'export', '/dev/sda'] with allowed return codes [0, 2] (shell=False, capture=True)
Can't find a uuid for volume: sda. Skipping dname.
Running command ['partprobe', '/dev/sdb'] with allowed return codes [0, 1] (shell=False, capture=False)
Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=False)
Running command ['mdadm', '--assemble', '--scan'] with allowed return codes [0, 1, 2] (shell=False, capture=False)
mdadm: /dev/md/0 assembled from 1 drive - not enough to start the array.
clear_holders running on '/sys/block/sdb/sdb1', with holders '['md0']'
clear_holders running on '/sys/devices/virtual/block/md0', with holders '[]'
stopping: /dev/md0
Running command ['mdadm', '--stop', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: stopped /dev/md0
Running command ['mdadm', '--remove', '/dev/md0'] with allowed return codes [0, 1] (shell=False, capture=False)
mdadm: error opening /dev/md0: No such file or directory

So the raid members are being created successfully, but for whatever reason, they can't be used when I actually WANT a RAID setup.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1519470] Re: Deployment always fails when creating a custom storage config

On Thu, Jan 14, 2016 at 1:34 PM, Jeff Lane <email address hidden>
wrote:

> I then ssh'd to the node and attempted to manually build the array using
> the same command curtin was using: (Note I added -v to see if I could
> get some additional info from the failure)
>
> ubuntu@supermicro:~$ sudo mdadm -v -v --create /dev/md0 --run --level=0
> --raid-devices=2 /dev/sda1 /dev/sdb1
> sudo: unable to resolve host supermicro
> mdadm: chunk size defaults to 512K
> mdadm: /dev/sda1 appears to be part of a raid array:
> level=raid0 devices=2 ctime=Thu Jan 14 19:29:50 2016
> mdadm: /dev/sdb1 appears to be part of a raid array:
> level=raid0 devices=2 ctime=Thu Jan 14 19:29:50 2016
> mdadm: creation continuing despite oddities due to --run
> mdadm: Defaulting to version 1.2 metadata
> mdadm: RUN_ARRAY failed: Invalid argument
>

Manually we really need to run:

1. mdadm --stop /dev/md0; mdadm --zero-superblock /dev/sda1; mdadm
--zero-superblock /dev/sdb1
2. then re-run the create; and optionally run it without --run.

Revision history for this message
Jeff Lane  (bladernr) wrote :

In any case, the flat filesystem is successful now. But software RAID is broken still.

Revision history for this message
Ryan Harper (raharper) wrote :

Also can you confirm raid module is loaded?

On Thu, Jan 14, 2016 at 1:34 PM, Jeff Lane <email address hidden>
wrote:

> I then ssh'd to the node and attempted to manually build the array using
> the same command curtin was using: (Note I added -v to see if I could
> get some additional info from the failure)
>
> ubuntu@supermicro:~$ sudo mdadm -v -v --create /dev/md0 --run --level=0
> --raid-devices=2 /dev/sda1 /dev/sdb1
> sudo: unable to resolve host supermicro
> mdadm: chunk size defaults to 512K
> mdadm: /dev/sda1 appears to be part of a raid array:
> level=raid0 devices=2 ctime=Thu Jan 14 19:29:50 2016
> mdadm: /dev/sdb1 appears to be part of a raid array:
> level=raid0 devices=2 ctime=Thu Jan 14 19:29:50 2016
> mdadm: creation continuing despite oddities due to --run
> mdadm: Defaulting to version 1.2 metadata
> mdadm: RUN_ARRAY failed: Invalid argument
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1519470
>
> Title:
> Deployment always fails when creating a custom storage config
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1519470/+subscriptions
>

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: Deployment always fails when creating a custom storage config
Download full text (4.0 KiB)

Ok, I did as you suggested and it still fails:
root@supermicro:~# mdadm --stop /dev/md0; mdadm --zero-superblock /dev/sda1; mdadm --zero-superblock /dev/sdb1
mdadm: error opening /dev/md0: No such file or directory
mdadm: Unrecognised md component device - /dev/sda1
root@supermicro:~# mdadm -v -v --create /dev/md0 --run --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm: chunk size defaults to 512K
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Invalid argument

FWIW, this is the version of mdadm used by the ephemeral
root@supermicro:~# mdadm --version
mdadm - v3.2.5 - 18th May 2012

and finally, I do not believe the RAID module is actually loaded.

root@supermicro:~# lsmod
Module Size Used by
dm_crypt 24576 0
overlay 45056 1
iscsi_tcp 20480 2
libiscsi_tcp 28672 1 iscsi_tcp
libiscsi 57344 2 libiscsi_tcp,iscsi_tcp
scsi_transport_iscsi 102400 3 iscsi_tcp,libiscsi
hid_logitech_dj 20480 0
hid_generic 16384 0
i2c_algo_bit 16384 0
ttm 94208 0
drm_kms_helper 126976 0
psmouse 114688 0
drm 344064 2 ttm,drm_kms_helper
ahci 36864 0
libahci 32768 1 ahci
pata_acpi 16384 0
usbhid 53248 0
e1000e 237568 0
hid 110592 3 hid_generic,usbhid,hid_logitech_dj
ptp 20480 1 e1000e
pps_core 20480 1 ptp

And finally, THAT seems to be the root cause. There are no software RAID modules that I can find on the ephemeral:

root@supermicro:/lib/modules/3.19.0-43-generic/kernel/drivers/md# ls
bcache dm-crypt.ko

Compared to my desktop running Trusty:
bladernr@sulaco:/lib/modules/3.13.0-74-generic/kernel/drivers/md$ ls
bcache dm-crypt.ko dm-multipath.ko dm-snapshot.ko linear.ko raid456.ko
dm-bio-prison.ko dm-delay.ko dm-queue-length.ko dm-switch.ko multipath.ko
dm-bufio.ko dm-flakey.ko dm-raid.ko dm-thin-pool.ko persistent-data
dm-cache-cleaner.ko dm-log.ko dm-region-hash.ko dm-verity.ko raid0.ko
dm-cache.ko dm-log-userspace.ko dm-round-robin.ko dm-zero.ko raid10.ko
dm-cache-mq.ko dm-mirror.ko dm-service-time.ko faulty.ko raid1.ko

but they DO exist on the image mounted on the tmpfs:

from the output of mount:
/dev/sdc on /media/root-ro type ext4 (ro)

root@supermicro:/media/root-ro/lib/modules/3.13.0-74-generic/kernel/drivers/md# ls
bcache dm-crypt.ko dm-multipath.ko dm-snapshot.ko linear.ko raid456.ko
dm-bio-prison.ko dm-delay.ko dm-queue-length.ko dm-switch.ko multipath.ko
dm-bufio.ko dm-flakey.ko dm-raid.ko dm-thin-pool.ko persistent-data
dm-cache-cleaner.ko dm-log.ko dm-region-hash.ko dm-verity.ko raid0.ko
dm-cache.ko dm-log-userspace.ko dm-round-robin.ko dm-zero.ko raid10.ko
dm-cache-mq.ko dm-mirror.ko dm-service-time.ko faulty.ko raid1.ko

But t...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1519470] Re: Deployment always fails when creating a custom storage config
Download full text (4.8 KiB)

On Thu, Jan 14, 2016 at 3:27 PM, Jeff Lane <email address hidden>
wrote:

> Ok, I did as you suggested and it still fails:
> root@supermicro:~# mdadm --stop /dev/md0; mdadm --zero-superblock
> /dev/sda1; mdadm --zero-superblock /dev/sdb1
> mdadm: error opening /dev/md0: No such file or directory
> mdadm: Unrecognised md component device - /dev/sda1
> root@supermicro:~# mdadm -v -v --create /dev/md0 --run --level=0
> --raid-devices=2 /dev/sda1 /dev/sdb1
> mdadm: chunk size defaults to 512K
> mdadm: Defaulting to version 1.2 metadata
> mdadm: RUN_ARRAY failed: Invalid argument
>
> FWIW, this is the version of mdadm used by the ephemeral
> root@supermicro:~# mdadm --version
> mdadm - v3.2.5 - 18th May 2012
>
> and finally, I do not believe the RAID module is actually loaded.
>
> root@supermicro:~# lsmod
> Module Size Used by
> dm_crypt 24576 0
> overlay 45056 1
> iscsi_tcp 20480 2
> libiscsi_tcp 28672 1 iscsi_tcp
> libiscsi 57344 2 libiscsi_tcp,iscsi_tcp
> scsi_transport_iscsi 102400 3 iscsi_tcp,libiscsi
> hid_logitech_dj 20480 0
> hid_generic 16384 0
> i2c_algo_bit 16384 0
> ttm 94208 0
> drm_kms_helper 126976 0
> psmouse 114688 0
> drm 344064 2 ttm,drm_kms_helper
> ahci 36864 0
> libahci 32768 1 ahci
> pata_acpi 16384 0
> usbhid 53248 0
> e1000e 237568 0
> hid 110592 3 hid_generic,usbhid,hid_logitech_dj
> ptp 20480 1 e1000e
> pps_core 20480 1 ptp
>
>
> And finally, THAT seems to be the root cause. There are no software RAID
> modules that I can find on the ephemeral:
>
> root@supermicro:/lib/modules/3.19.0-43-generic/kernel/drivers/md# ls
> bcache dm-crypt.ko
>
> Compared to my desktop running Trusty:
> bladernr@sulaco:/lib/modules/3.13.0-74-generic/kernel/drivers/md$ ls
> bcache dm-crypt.ko dm-multipath.ko
> dm-snapshot.ko linear.ko raid456.ko
> dm-bio-prison.ko dm-delay.ko dm-queue-length.ko
> dm-switch.ko multipath.ko
> dm-bufio.ko dm-flakey.ko dm-raid.ko
> dm-thin-pool.ko persistent-data
> dm-cache-cleaner.ko dm-log.ko dm-region-hash.ko
> dm-verity.ko raid0.ko
> dm-cache.ko dm-log-userspace.ko dm-round-robin.ko dm-zero.ko
> raid10.ko
> dm-cache-mq.ko dm-mirror.ko dm-service-time.ko faulty.ko
> raid1.ko
>
> but they DO exist on the image mounted on the tmpfs:
>
> from the output of mount:
> /dev/sdc on /media/root-ro type ext4 (ro)
>
> root@supermicro:/media/root-ro/lib/modules/3.13.0-74-generic/kernel/drivers/md#
> ls
> bcache dm-crypt.ko dm-multipath.ko
> dm-snapshot.ko linear.ko raid456.ko
> dm-bio-prison.ko dm-delay.ko dm-queue-length.ko
> dm-switch.ko multipath.ko
> dm-bufio.ko dm-flakey.ko dm-raid.ko
> dm-thin-pool.ko persistent-data
> dm-cache-cleaner.ko dm-log.ko dm-region-hash.ko
> dm-verity.ko raid0.ko
> ...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :

On Thu, Jan 14, 2016 at 6:02 PM, Ryan Harper <email address hidden> wrote:
> If you reset like I suggested and then modprobe raid0, can you re-run the
> create command successfully?

No, as I said, the image that is running during deployment has no RAID
modules in /lib/modules.

> And finally, THAT seems to be the root cause. There are no software RAID
> modules that I can find on the ephemeral:
>
> root@supermicro:/lib/modules/3.19.0-43-generic/kernel/drivers/md# ls
> bcache dm-crypt.ko
>

The ONLY contents of the md directory is for bcache and the dm-crypt
module. The raidX.ko modules are completely missing. So I think this
is an image problem, RAID is probably OK.

Revision history for this message
matthew F (matthew-f1989) wrote : Re: Deployment always fails when creating a custom storage config

Jeff, if you have resolved this problem I would be curious to know how you did it. I've been having exactly the same problem and while I really want a RAID configuration I've been forced not to because I can't get past this problem.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Update:
I just saw Matthew F's comment and decided to retry. First, I deployed with a RAID0 config using today's Xenial from the daily image stream. This seems to have successfully installed, AND the raid modules are present in the deployment ephemeral:

ubuntu@Y-Wing:~$ lsmod |grep raid
raid10 49152 0
raid456 98304 0
async_raid6_recov 20480 1 raid456
async_memcpy 16384 2 raid456,async_raid6_recov
async_pq 16384 2 raid456,async_raid6_recov
async_xor 16384 3 async_pq,raid456,async_raid6_recov
async_tx 16384 5 async_pq,raid456,async_xor,async_memcpy,async_raid6_recov
raid6_pq 102400 4 async_pq,raid456,btrfs,async_raid6_recov
raid1 36864 0
raid0 20480 1
ubuntu@Y-Wing:~$ uname -a
Linux Y-Wing 4.3.0-7-generic #18-Ubuntu SMP Tue Jan 19 15:46:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

** NOTE that I am also using a custom built version of Curtin from trunk, because of bug 1533846, I'm not sure if the fix for the Xenial deployment issue has landed in regular curtin packaging yet.

ubuntu@Y-Wing:~$ sudo mdadm --misc --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue Jan 26 19:14:47 2016
     Raid Level : raid0
     Array Size : 3906756608 (3725.77 GiB 4000.52 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Tue Jan 26 19:14:47 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : Y-Wing:0 (local to host Y-Wing)
           UUID : b5f181f4:77763e35:f480aac0:b8865ce1
         Events : 0

    Number Major Minor RaidDevice State
       0 8 1 0 active sync /dev/sda1
       1 8 17 1 active sync /dev/sdb1

Revision history for this message
Jeff Lane  (bladernr) wrote :

Update part 2:
I repeated this with the other images I have on my MAAS server from the daily stream:

Wily: Successfully installed

Trusty: Failed, there are no RAID modules available in the Trusty ephemeral
http://paste.ubuntu.com/14674158/ - cloud-init-output.log that shows the tracebacks. This is what I observed before that led to filing this bug.

Precise: Successfully installed.

Verdict then is that the issue is specific to the Trusty image (at least of the ones I tested).

Revision history for this message
Jeff Lane  (bladernr) wrote :

Matthew F:

You'll need to do the following to get an installable system:
Try Wily or Precise instead of Trusty for now.

Try Xenial. To install Xenial, for the moment, you'll need to grab the trunk for curtin:

bzr branch lp:curtin
cd <path to curtin trunk local branch>
./tools/build-deb

then copy the debs to your MAAS server and

sudo dpkg -i <debs>

then you should be able to successfully deploy Xenial images.

Note I'm using images from the Daily stream, not the Release stream to test all this. YMMV. (I think I remembered everything necessary above)

summary: - Deployment always fails when creating a custom storage config
+ Trusty Deployment w/ RAID storage config fails because Trusty images do
+ not contain RAID kernel modules
Changed in maas-images:
status: New → Confirmed
Changed in curtin:
status: Triaged → Invalid
Revision history for this message
Jeff Lane  (bladernr) wrote :

Marking Curtin invalid, this is not a curtin issue. Leaving MAAS for now but that's probably invalid as well. Added the maas-images project because this is a broken image.

Changed in maas:
status: Triaged → Invalid
Scott Moser (smoser)
no longer affects: curtin
Changed in maas-images:
importance: Undecided → Medium
assignee: nobody → Scott Moser (smoser)
status: Confirmed → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

Hi.
I'm pretty sure that the issue here is that you were installing hwe-w (or any hwe-* would have done the same).
Trusty's ephemeral image contains the 'linux-generic' package (hwe-t) installed.
So when we boot the ephemearl image with a hwe-t kernel it boots and finds all modules available that would be available if linux-generic were installed.

However, when you boot into the ephemeral environment with any hwe-* kernel, you have only the modules available that are inside the initramfs. Generally speaking, we've attempted to make the initramfs "fat" and include anything we might need.

The change I made to maas-images was to install mdadm (and also lvm2) into the environment that the 'boot-initrd' initramfs is generated. The result is that the mdadm and lvm2 initramfs hooks add the modules they think are necessary.

I'm attaching the changes that each package added.

Revision history for this message
Scott Moser (smoser) wrote :

maas daily images:
  trusty 20160217.1
  wily 20160217.1
should have the fixes. you can see which ephemeral image versions you have with: https://gist.github.com/smoser/2cec72d243404a72fdb5

Please test and re-open if you find this does not solve your problem.

Changed in maas:
milestone: 1.9.0 → none
Revision history for this message
Scott Moser (smoser) wrote :

latest precise images (20160218) should also have mdadm and lvm in their initramfs.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Yes, confirmed lated dailies resolved this. I can now deploy systems using a software RAID storage config.

tags: removed: hwcert-server
Changed in maas-images:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.