Unbootable system after installation

Bug #1671605 reported by Ante Karamatić
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MAAS
Incomplete
High
Unassigned
curtin
Invalid
Undecided
Unassigned

Bug Description

On 2.2b2, installing nodes sometimes ends with unbootable system. Commissioning goes just fine, installation finishes and then on next boot (into installed system), console shows:

Intel(R) Boot Agent XE v2.3.11
Copyright (C) 1997-2013, Intel Corporation

CLIENT MAC ADDR: 2C 60 0C CD 0F CF GUID: 7F82C921 3D4A 11E5 A482 2C600CCD0FD1
CLIENT IP: 172.16.7.22 MASK: 255.255.255.0 DHCP IP: 172.16.7.2
GATEWAY IP: 172.16.7.1

PXELINUX 6.03 PXE 20151222 Copyright (C) 1994-2014 H. Peter Anvin et al
Booting local disk ...
WARN: No MBR magic, treating disk as raw.
Booting...

Rebooting the system and booting it from the disk boots the installed system just fine.

It seems that PXE image is offloading booting to one of the disks on the machine that doesn't have system installed. Installation in this case is done on sdf (MAAS automatically selected this disk, without user interaction), and it seems that PXE offloads booting to sda or some other disk.

Revision history for this message
Ante Karamatić (ivoks) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Ante,

This seems like the root device is different from the boot device in the BIOS. In the MAAS UI, did you select the 'boot' device for the disk that is the default disk in the BIOS??

Changed in maas:
status: New → Incomplete
Revision history for this message
Ante Karamatić (ivoks) wrote :

I did not select anything. I commissioned the nodes and then hit deploy. MAAS automatically selected the root device.

When I try to change root device in UI, MAAS creates GPT partition table and I can't tell it to create MBR. This then again results in unbootable system, of course.

Revision history for this message
Chris Gregan (cgregan) wrote :

We've been tracking what seems like the same bug here: https://bugs.launchpad.net/juju-core/+bug/1670499

tags: added: cdo-qa-blocker
Revision history for this message
Chris Gregan (cgregan) wrote :

Please note that this issue may be fixed in

MAAS Version 2.2.0 (beta3+bzr5815)

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Ante,

As stated before, it sounds that the 'Boot' device selected on the BIOS vs the first disk identified by the OS are different. In other words:

1. The OS identifies hdX as sda, which is /not/ set as the boot device in the BIOS.
2. The BIOS has hdY as the boot device, identified as sd[b...x].

So, while MAAS installs in 'sda', the BIOS attempts to boot from 'sd[b...x]'. For that, you can select in MAAS which device is the *boot* device.

Ante, in the UI, can you please select the 'Boot' option to a different disk other than 'sda'. That being the disk where the BIOS is booting from!

Revision history for this message
Ante Karamatić (ivoks) wrote : Re: [Bug 1671605] Re: Unbootable system after installation

Andres

If you reread again, you'll notice that MAAS installs to sdf (BIOS boots
from sda), and selecting anything else is impossible because MAAS decides
to use GPT, instead of MBR.

Problem doesn't exist on 2.1. This is a regression.

pon, 13. ožu 2017. 22:50 Andres Rodriguez <email address hidden> je
napisao:

> @Ante,
>
> As stated before, it sounds that the 'Boot' device selected on the BIOS
> vs the first disk identified by the OS are different. In other words:
>
> 1. The OS identifies hdX as sda, which is /not/ set as the boot device in
> the BIOS.
> 2. The BIOS has hdY as the boot device, identified as sd[b...x].
>
> So, while MAAS installs in 'sda', the BIOS attempts to boot from
> 'sd[b...x]'. For that, you can select in MAAS which device is the *boot*
> device.
>
> Ante, in the UI, can you please select the 'Boot' option to a different
> disk other than 'sda'. That being the disk where the BIOS is booting
> from!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1671605
>
> Title:
> Unbootable system after installation
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1671605/+subscriptions
>
--
Ante Karamatić
<email address hidden>
Canonical

Revision history for this message
Ante Karamatić (ivoks) wrote :

Just noticed I didn't wrote anywhere that MAAS installs to sdf. It creates
MBR.

Selecting other disks as boot creates GPT and makes system unbootable.

pon, 13. ožu 2017. 23:23 Ante Karamatić <email address hidden> je
napisao:

> Andres
>
> If you reread again, you'll notice that MAAS installs to sdf (BIOS boots
> from sda), and selecting anything else is impossible because MAAS decides
> to use GPT, instead of MBR.
>
> Problem doesn't exist on 2.1. This is a regression.
>
> pon, 13. ožu 2017. 22:50 Andres Rodriguez <email address hidden> je
> napisao:
>
> @Ante,
>
> As stated before, it sounds that the 'Boot' device selected on the BIOS
> vs the first disk identified by the OS are different. In other words:
>
> 1. The OS identifies hdX as sda, which is /not/ set as the boot device in
> the BIOS.
> 2. The BIOS has hdY as the boot device, identified as sd[b...x].
>
> So, while MAAS installs in 'sda', the BIOS attempts to boot from
> 'sd[b...x]'. For that, you can select in MAAS which device is the *boot*
> device.
>
> Ante, in the UI, can you please select the 'Boot' option to a different
> disk other than 'sda'. That being the disk where the BIOS is booting
> from!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1671605
>
> Title:
> Unbootable system after installation
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1671605/+subscriptions
>
> --
> Ante Karamatić
> <email address hidden>
> Canonical
>
--
Ante Karamatić
<email address hidden>
Canonical

Revision history for this message
Ante Karamatić (ivoks) wrote :

Selecting sda as boot device, while leaving sdf to be root device (as selected by MAAS) also doesn't work:

 + grub-install /dev/sda
 Installing for i386-pc platform.
 grub-install: error: unable to identify a filesystem in hostdisk//dev/sda; safety check can't be performed.
 + exit
 failed to install grub!

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Looking at the config I see the following, which seems to be provided the correct configuration.

  - grub_device: true
    id: sda
    model: INTEL SSDSC2BX48
    name: sda
    ptable: msdos
    serial: BTHC63800AK2480MGN
    type: disk
    wipe: superblock
  - device: sdf
    id: sdf-part1
    name: sdf-part1
    number: 1
    offset: 4194304B
    size: 1000198897664B
    type: partition
    uuid: c4fbf7b8-58fe-44a7-b7dc-697d9968ae16
    wipe: superblock
  - fstype: ext4
    id: sdf-part1_format
    label: root
    type: format
    uuid: a5940775-aaec-4b0c-a10f-eb98f65122be
    volume: sdf-part1
  - device: sdf-part1_format
    id: sdf-part1_mount
    path: /
    type: mount

@Ante, please also attache the full installation log for the curtin developers.

Revision history for this message
Ante Karamatić (ivoks) wrote :

In this run I set sdb as root device. I kept sda as boot device.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Ante,

Change the boot device in the UI to be the correct boot disk. Then over the API reset the storage layout.

maas admin machine set-storage-layout <node-id> storage_layout=flat

Revision history for this message
Ryan Harper (raharper) wrote :

They're no boot partition; something has to hold /boot

On Tue, Mar 14, 2017 at 5:57 AM, Ante Karamatić <
<email address hidden>> wrote:

> In this run I set sdb as root device. I kept sda as boot device.
>
> ** Attachment added: "curtin.log"
> https://bugs.launchpad.net/maas/+bug/1671605/+attachment/
> 4837513/+files/curtin.log
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1671605
>
> Title:
> Unbootable system after installation
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1671605/+subscriptions
>

Revision history for this message
Scott Moser (smoser) wrote :

The config shown in comment 10, and attachment 'curtin data' is bad.
it says that 'sda' (BTHC63800AK2480MGN) is the grub drive (boot disk), but that it should be wiped and not partitioned at all.
The installation is done to 'sdf' (1000198897664B) which is partitioned with dos partition table.

Note, these letters don't mean anything, what matters is the serial on the device.

I'm pretty sure that is just busted configuration. Grub cannot install to an un-partitioned disk, and grub installation fails, and curtin reports this failure.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hey Ante,

Scott looked at the issue and confirmed there's a missing piece here. It seems I also mislead you.

For 'sda' to work as a 'boot' device, it needs a partition. In the meantime can you do:

1. create an empty partition in 'sda'.
2. select 'sda' as 'boot'.

That should cause grub to install in the partition. I'll check with my team whether we should be handling this automatically, although there could be the case that's not desirable if other partitions are created in the 'boot' device.

Revision history for this message
Scott Moser (smoser) wrote :

To demonstrate the issue, I booted a openstack vm, with 2 disks (vda, vdb). This is booted bios (not uefi).

## wipe the disk
$ disk="/dev/vdb"
$ sudo umount $disk
$ sudo dd if=/dev/zero of=$disk bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00783073 s, 134 MB/s

$ sudo udevadm settle

## attempt installation of grub to /dev/vdb
$ sudo grub-install "$disk"
Installing for i386-pc platform.
grub-install: error: unable to identify a filesystem in hostdisk//dev/vdb; safety check can't be performed.

## now partition it and try
$ (echo unit: sectors; echo label: dos; echo 2048,) | sudo sfdisk --force $disk
Checking that no-one is using this disk right now ... OK

Disk /dev/vdb: 40 GiB, 42949672960 bytes, 83886080 sectors
...
Device Boot Start End Sectors Size Id Type
/dev/vdb1 2048 83886079 83884032 40G 83 Linux
...

$ sudo udevadm settle
$ sudo grub-install $disk
Installing for i386-pc platform.
Installation finished. No error reported.

Revision history for this message
Ante Karamatić (ivoks) wrote :

Uhm... I'm quite sure it doesn't need partition, but it does need MBR.

Changed in maas:
milestone: none → 2.2.0
importance: Undecided → High
Revision history for this message
Chris Gregan (cgregan) wrote :

I'd like to bump this to Critical as it blocks deploys in our CI and is now a Customer blocker. This will gate the release if it is not fixed.

Revision history for this message
Scott Moser (smoser) wrote :

I would have thought this would work, but grub is definitely insisting on there being a partition on the disk that you grub-install to, not just a partition table.

See example here... 'parted' is what curtin uses for partitioning in msdos.

## wipe the disk (first 2M)
$ disk="/dev/vdb"
$ sudo dd if=/dev/zero of=$disk bs=1M count=2
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00783073 s, 134 MB/s
$ sudo udevadm settle

## show failure of installation with no partition table
$ sudo grub-install "$disk"
Installing for i386-pc platform.
grub-install: error: unable to identify a filesystem in hostdisk//dev/vdb; safety check can't be performed.

## put a partition table on disk, but no partitions
$ sudo parted $disk --script mklabel msdos
$ sudo blkid $disk
/dev/vdb: PTUUID="502420fa" PTTYPE="dos"

## attempt grub install
$ sudo grub-install "$disk" ; echo $?
Installing for i386-pc platform.
grub-install: error: unable to identify a filesystem in hostdisk//dev/vdb; safety check can't be performed.
1

## Try harder
$ sudo grub-install --skip-fs-probe $disk ; echo $?
Installing for i386-pc platform.
grub-install: warning: Attempting to install GRUB to a partitionless disk or to a partition. This is a BAD idea..
grub-install: error: embedding is not possible, but this is required for cross-disk install.
1

Revision history for this message
Ryan Harper (raharper) wrote :

On Wed, Mar 15, 2017 at 12:44 PM, Scott Moser <email address hidden> wrote:

> I would have thought this would work, but grub is definitely insisting
> on there being a partition on the disk that you grub-install to, not
> just a partition table.
>

In particular, unless one creates a real partition, the 'MBR Gap' that grub
uses
cannot be calculated. The definition of the gap is the space between the
end of the MBR and the start of the first partition.

Without a first partition, grub cannot calculate this space.

Revision history for this message
Ryan Harper (raharper) wrote :

Marking curtin task invalid at this time.

grub requires at least one partition to determine the MBR gap, or a bios_boot partition for GPT (UEFI systems use /boot/efi partition to hold grub data); Grub upstream does not support using blocklists as any updates to the filesystem may leave the system unbootable again. curtin is doing as-told; please re-open if you find new information indicating that curtin need to do something different.

Changed in curtin:
status: New → Invalid
Revision history for this message
Ante Karamatić (ivoks) wrote :

With MAAS 2.1, I do not have this problem. Whichever disk I select as boot/root, it boots just fine. I've attached outputs of get-curtin-config and 07-block-devices.out.

Revision history for this message
Ryan Harper (raharper) wrote :

The curtin config in this case consistently uses sdb, I assume that's the
disk you selected as 'boot/root'

sdb is marked with grub_device: True, using msdos partition table, there's
at least one partition on sdb (sdb-part1, which is also root).

On Fri, Mar 17, 2017 at 5:44 AM, Ante Karamatić <
<email address hidden>> wrote:

> With MAAS 2.1, I do not have this problem. Whichever disk I select as
> boot/root, it boots just fine. I've attached outputs of get-curtin-
> config and 07-block-devices.out.
>
> ** Attachment added: "2.1.tar"
> https://bugs.launchpad.net/maas/+bug/1671605/+attachment/
> 4839337/+files/2.1.tar
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1671605
>
> Title:
> Unbootable system after installation
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1671605/+subscriptions
>

Revision history for this message
Andres Rodriguez (andreserl) wrote :

I discussed this further with Ante, and he did suggest we mark this bug as 'Won't Fix', provided that it is confirmed that the install disk is different from the boot disk. That is not an issue with MAAS itself, but rather a configuration issue.

That, however, doesn't change the fact that there's still a bug for the selection of a boot device.

Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :

i do have same issue on maas 2.1.5, it affects all my machine with more than 1 disk (i mean, after hw raid config). i didn't understand how to workaround it.

Revision history for this message
Zoltan Arnold Nagy (zoltan) wrote :

I'm hitting the same issue although even deploy fails. This happens on a system with a single SATA SSD and NVMe drives.

Removing the NVMe drives physically fixes the issue.

The boot disk and the install disk would be the same but please note that I don't even get to deploy a system in my case.

Revision history for this message
Daniel Souza (danielsouzasp) wrote :

Hello guys,

I'm running MAAS version: 2.3.5 (ubuntu1~16.04.1), and I am facing the same issue with 1 NVMe driver + 2x SATA SSD, MaaS installs the OS on nvme0n1-part1 no error, but when it tries to boot from local after PXE load, I see the error no "mbr magic treating disk as raw", and if I boot manually from NVMe disk, the deploy process finishes normally but it wont work in next reboot.

Is this a MaaS bug?

Revision history for this message
Daniel Souza (danielsouzasp) wrote :

additional info, I can see "APPEND hd0" at
/usr/lib/python3/dist-packages/provisioningserver/templates/pxe/config.local.amd64.template
maybe we need some conditional here for these cases.

Revision history for this message
Michael Cowart (evtmcowart) wrote :

I'm seeing this same issue on 2.4. Have a server with a single NVME drive + 8 SATA SSDs configured in software RAID. On initial commission MAAS wanted to install / to one of the SATA drives. Installing root/boot partitions to the NVME will not boot after installation.

Revision history for this message
Dylan Wang (hyuwang) wrote :

have the save issue on 2.5, it only happen to one specific server among hundreds.

When I do enlist -> commision -> config -> deploy, everything went well.

But that one, I did enlist -> config disk & network -> failed commision, then I try mark broken/rescue/exit rescue/... eventually I fixed it by re-commission, then deploy.

Deploy works, it just never able to finish booting from local disk.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.