curtin fails to deploy centos 8 on nvme with multipath from ubuntu 20.04

Bug #1914812 reported by Dimitri John Ledkov
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alexsander de Souza
3.3
Fix Released
High
Alexsander de Souza
3.4
Fix Released
High
Alexsander de Souza
curtin
Won't Fix
Undecided
Unassigned

Bug Description

https://bugs.launchpad.net/ubuntu/+source/efivar/+bug/1891718 aka https://github.com/rhboot/efivar/pull/158 is not fixed in centos.

centos 8 does not support multipath nvme.

ubuntu 20.04 supports multipath nvme.

efibootmgr in a centos chroot, when booted into ubuntu 20.04 kernel fails to create efibootmgr entry for the multipath nvme drive.

efibootmgr on ubuntu 20.04 can create correct entry for the mulipath nvme drive, and the entry is universal one (based on gpt uuids, rather than anything nvme specific). Thus it does not matter which efibootmgr is used to create it.

But since the centos one is buggy, we should be installing efiboogmgr in the ephemeral environment and using that one; instead of the buggy one inside the centos chroot.

as a workarounds one can
* choose non-nvme drive as the boot device
* use ubuntu 18.04 or older as the ephemeral provisioning enviornment
* in maas set to boot ephemeral environment with kernel argument
  "nvme-core.multipath=0"
* ask RHEL to fix their stuff

Related branches

description: updated
description: updated
Revision history for this message
Sean Feole (sfeole) wrote :

Thanks for the help Dimitri, I also ran into this problem today while trying to deploy centos8 via maas on a machine with a nvme drive. I was able to workaround this by appending "nvme-core.multipath=0" in the kernel boot arguments.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah the irony: there was a curtin bug for a little while where it would try to use the efibootmgr outside the target, which was a problem because it's not installed in general (so I guess I would prefer to only use the host efibootmgr when installing centos). Is there no way we can do the job of efibootmgr ourselves, by feeding the GUID in directly or something?

Revision history for this message
Bill Wear (billwear) wrote :

so reading the comments, i have to ask: is this something MAAS can actually fix, or is it out of our grasp?

Changed in maas:
status: New → Incomplete
Revision history for this message
Sean Feole (sfeole) wrote :

I think that's for the maas team to decide ?

Revision history for this message
Alberto Donato (ack) wrote :

Michael, if MAAS were to install efibootmgr in the ephemeral env, would curtin prefer it over the one in the target?

It shouldn't be a problem for MAAS to install it, but which one should be used seems more of a curtin policy.

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

Folks, let me resurrect this bug as it's impacting our partner and the customer.
I've attached the full log of the installation.

Clearly there is no workaround currently that allows to deploy Centos on the NVME with MAAS.

* Bionic is going out of support now.
* If I set the parameter of nvme_core.multipath - it either breaks Ubuntu deployment or doesn't allow the machine to boot once the provisioning of the CentOS has been completed.

What other information you need to tag the bug as confirmed and act on that?

Revision history for this message
Adam Collard (adam-collard) wrote :

@Michael Hudson-Doyle - please can you look into the latest comments and see what Curtin could do here?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think it might be possible to work around this by setting grub.update_nvram to False in the config and then using a late command to install efibootmgr in the ephemeral environment and invoke that. I guess it wouldn't be quite trivial to figure out which disk and partition you need to pass to efibootmgr but it might work?

As for hacking a workaround into curtin, I guess it would be possible but it would also be pretty ugly. Is there really no avenue to getting this fixed in Centos?

Changed in maas:
status: Incomplete → New
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So after a bit of discussion the best way to deploy centos8 seems to be to boot the ephemeral environment with nvme-core.multipath=0 but not the OS. This can be done by booting the ephemeral environment with nvme-core.multipath=0 before the --- separator but I don't think this can be done by an operator of MAAS, it would require a code change to MAAS (aiui, I might be wrong).

The other way to fix this would be to use efibootmgr from the ephemeral environment when deploying centos8. This would be a curtin change (and a pretty ugly one but well) but more significantly, my understanding is that efibootmgr is not currently part of the ephemeral environment. We could install it during deployment but this obviously requires archive access which I don't think is a given? Or we could change the process that builds the ephemeral environment to include it but I don't know how to do that. So from my perspective the command line hack in MAAS is more appropriate (sorry!).

Changed in curtin:
status: New → Won't Fix
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Tentatively updating bug statuses to match previous comment. Feel free to disagree!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Oh I forgot about the first workaround I thought of, which has the advantage of being centos specific and not breaking ubuntu deployments. Put something like this https://paste.ubuntu.com/p/HshCpxXCzD/ in your curtin_userdata_centos file. (This doesn't do the boot entry reordering curtin does by default -- that could be added too if needed I guess).

tags: added: bug-council
Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

From our meeting notes:

- Kernel options can be put before the `---` (to have it just for the ephemeral environment) or after to have it for the ephemeral environment and the deployed environment
- Right now, the code that composes the kernel options gets lied to when we're deploying a non-Ubuntu OS, so for e.g. CentOS the params.osystem is set to Ubuntu Focal
- To fix the bug as we had hoped, we'd need to propagate the target osystem through to the compose_purpose_opts and friends
    - This would require changing the RPC
    - The last time that we have the actual target OS is in src/maasserver/rpc/boot.py:get_boot_config_for_machine()
    - and we need it in src/provisioningserver/kernel_opts.py:compose_purpose_opts()

Estimated at 3 story points --> gnarly work but not overly complicated

Changed in maas:
assignee: nobody → Alexsander de Souza (alexsander-souza)
milestone: none → 3.4.x
tags: removed: bug-council
Changed in maas:
importance: Undecided → High
status: New → Triaged
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.4.x → 3.5.0
Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.