curtin

curtin fails to deploy centos 8 on nvme with multipath from ubuntu 20.04

Bug #1914812 reported by Dimitri John Ledkov on 2021-02-05

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
MAAS	Fix Released	High	Alexsander de Souza	MAAS 3.5.0-beta1
3.3	Fix Released	High	Alexsander de Souza	MAAS 3.3.5
3.4	Fix Released	High	Alexsander de Souza	MAAS 3.4.0-rc1
curtin	Won't Fix	Undecided	Unassigned

Bug Description

https://bugs.launchpad.net/ubuntu/+source/efivar/+bug/1891718 aka https://github.com/rhboot/efivar/pull/158 is not fixed in centos.

centos 8 does not support multipath nvme.

ubuntu 20.04 supports multipath nvme.

efibootmgr in a centos chroot, when booted into ubuntu 20.04 kernel fails to create efibootmgr entry for the multipath nvme drive.

efibootmgr on ubuntu 20.04 can create correct entry for the mulipath nvme drive, and the entry is universal one (based on gpt uuids, rather than anything nvme specific). Thus it does not matter which efibootmgr is used to create it.

But since the centos one is buggy, we should be installing efiboogmgr in the ephemeral environment and using that one; instead of the buggy one inside the centos chroot.

as a workarounds one can
* choose non-nvme drive as the boot device
* use ubuntu 18.04 or older as the ephemeral provisioning enviornment
* in maas set to boot ephemeral environment with kernel argument
"nvme-core.multipath=0"
* ask RHEL to fix their stuff

See original description

Related branches

~alexsander-souza/maas:lp1914812_to_3_3

Merged into maas:3.3

Alexsander de Souza: Approve on 2023-08-24

MAAS Lander: Approve on 2023-08-23

~alexsander-souza/maas:lp1914812_to_3_4

Merged into maas:3.4

Alexsander de Souza: Approve on 2023-08-24

MAAS Lander: Approve on 2023-08-23

~alexsander-souza/maas:lp1914812_add_centos_quirks

Merged into maas:master

Christian Grabowski: Approve on 2023-07-10

MAAS Lander: Approve on 2023-07-08

Dimitri John Ledkov (xnox) on 2021-02-05

description:	updated
description:	updated

Revision history for this message

Sean Feole (sfeole) wrote on 2021-02-05:

Thanks for the help Dimitri, I also ran into this problem today while trying to deploy centos8 via maas on a machine with a nvme drive. I was able to workaround this by appending "nvme-core.multipath=0" in the kernel boot arguments.

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2021-02-08:

Ah the irony: there was a curtin bug for a little while where it would try to use the efibootmgr outside the target, which was a problem because it's not installed in general (so I guess I would prefer to only use the host efibootmgr when installing centos). Is there no way we can do the job of efibootmgr ourselves, by feeding the GUID in directly or something?

Revision history for this message

Bill Wear (billwear) wrote on 2021-07-07:

so reading the comments, i have to ask: is this something MAAS can actually fix, or is it out of our grasp?

Changed in maas:
status:	New → Incomplete

Revision history for this message

Sean Feole (sfeole) wrote on 2022-03-16:

I think that's for the maas team to decide ?

Revision history for this message

Alberto Donato (ack) wrote on 2022-04-04:

Michael, if MAAS were to install efibootmgr in the ephemeral env, would curtin prefer it over the one in the target?

It shouldn't be a problem for MAAS to install it, but which one should be used seems more of a curtin policy.

Changed in maas:
status:	Incomplete → New
status:	New → Incomplete

Revision history for this message

Andrey Grebennikov (agrebennikov) wrote on 2023-05-16:

https://pastebin.ubuntu.com/p/VYb87FzMpP/ Edit

Folks, let me resurrect this bug as it's impacting our partner and the customer.
I've attached the full log of the installation.

Clearly there is no workaround currently that allows to deploy Centos on the NVME with MAAS.

* Bionic is going out of support now.
* If I set the parameter of nvme_core.multipath - it either breaks Ubuntu deployment or doesn't allow the machine to boot once the provisioning of the CentOS has been completed.

What other information you need to tag the bug as confirmed and act on that?

Revision history for this message

Adam Collard (adam-collard) wrote on 2023-05-18:

@Michael Hudson-Doyle - please can you look into the latest comments and see what Curtin could do here?

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2023-05-18:

I think it might be possible to work around this by setting grub.update_nvram to False in the config and then using a late command to install efibootmgr in the ephemeral environment and invoke that. I guess it wouldn't be quite trivial to figure out which disk and partition you need to pass to efibootmgr but it might work?

As for hacking a workaround into curtin, I guess it would be possible but it would also be pretty ugly. Is there really no avenue to getting this fixed in Centos?

Michael Hudson-Doyle (mwhudson) on 2023-05-21

Changed in maas:
status:	Incomplete → New

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2023-05-21:

So after a bit of discussion the best way to deploy centos8 seems to be to boot the ephemeral environment with nvme-core.multipath=0 but not the OS. This can be done by booting the ephemeral environment with nvme-core.multipath=0 before the --- separator but I don't think this can be done by an operator of MAAS, it would require a code change to MAAS (aiui, I might be wrong).

The other way to fix this would be to use efibootmgr from the ephemeral environment when deploying centos8. This would be a curtin change (and a pretty ugly one but well) but more significantly, my understanding is that efibootmgr is not currently part of the ephemeral environment. We could install it during deployment but this obviously requires archive access which I don't think is a given? Or we could change the process that builds the ephemeral environment to include it but I don't know how to do that. So from my perspective the command line hack in MAAS is more appropriate (sorry!).

Changed in curtin:
status:	New → Won't Fix

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2023-05-21:

#10

Tentatively updating bug statuses to match previous comment. Feel free to disagree!

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2023-05-22:

#11

Oh I forgot about the first workaround I thought of, which has the advantage of being centos specific and not breaking ubuntu deployments. Put something like this https://paste.ubuntu.com/p/HshCpxXCzD/ in your curtin_userdata_centos file. (This doesn't do the boot entry reordering curtin does by default -- that could be added too if needed I guess).

Jerzy Husakowski (jhusakowski) on 2023-05-25

tags:

added: bug-council

Revision history for this message

Thorsten Merten (thorsten-merten) wrote on 2023-06-01:

#12

From our meeting notes:

- Kernel options can be put before the `---` (to have it just for the ephemeral environment) or after to have it for the ephemeral environment and the deployed environment
- Right now, the code that composes the kernel options gets lied to when we're deploying a non-Ubuntu OS, so for e.g. CentOS the params.osystem is set to Ubuntu Focal
- To fix the bug as we had hoped, we'd need to propagate the target osystem through to the compose_purpose_opts and friends
    - This would require changing the RPC
    - The last time that we have the actual target OS is in src/maasserver/rpc/boot.py:get_boot_config_for_machine()
    - and we need it in src/provisioningserver/kernel_opts.py:compose_purpose_opts()

Estimated at 3 story points --> gnarly work but not overly complicated

Changed in maas:
assignee:	nobody → Alexsander de Souza (alexsander-souza)
milestone:	none → 3.4.x

Jerzy Husakowski (jhusakowski) on 2023-06-01

tags:	removed: bug-council
Changed in maas:
importance:	Undecided → High
status:	New → Triaged

Alexsander de Souza (alexsander-souza) on 2023-07-08

Changed in maas:
status:	Triaged → In Progress

MAAS Lander (maas-lander) on 2023-07-10

Changed in maas:
status:	In Progress → Fix Committed

Adam Collard (adam-collard) on 2023-07-17

Changed in maas:
milestone:	3.4.x → 3.5.0

Anton Troyanov (troyanov) on 2024-03-05

Changed in maas:
milestone:	3.5.0 → 3.5.0-beta1
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

https://pastebin.ubuntu.com/p/VYb87FzMpP/ Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.