[2.3+] Unable to disk erase if machine is deployed with a non-lts kernel

Bug #1730525 reported by Matt Dirba on 2017-11-06
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
High
Lee Trager
2.3
High
Andres Rodriguez

Bug Description

I have asked maas 2.2.2 to erase disks as they are released. This works great for xenial but fails on machines deployed with artful. Here is what happens.

1) In maasserver/models/node.py the release_or_erase function calls either the start_disk_erasing or release function depending on whether the disks need to be erased or not. Note: release clears the distro and hwe_kernel flags from the node object but start_disk_erasing does not.
2) The machine is rebooted and requests its grub.cfg which results in a function call to get_config in maasserver/rpc/boot.py. From there, we get the purpose of the boot by calling function get_boot_purpose defined in maasserver/rpc/boot.py. The boot purpose returned is "commissioning" because "The environment (boot images, kernel options, etc for erasing is the same as that of commissioning." (as documented in the comments)
3) Since the boot purpose is commissioning we decide to overwrite the system and series but we do not modify the architecture or hwe_kernel.
4) Still in get_config we validate this combination of artful kernel with a xenial series and promptly throw the following error and give up on erasing the disk.
maas.node: [error] hostname: Marking node failed: Missing boot image ubuntu/amd64/ga-17.10/xenial.

My hack/workaround is as follows in case you are interested.

diff --git a/maasserver/rpc/boot.py b/maasserver/rpc/boot.py
index 5ec41bb..83d58d3 100644
--- a/maasserver/rpc/boot.py
+++ b/maasserver/rpc/boot.py
@@ -199,6 +199,9 @@ def get_config(
         if purpose == "commissioning":
             osystem = Config.objects.get_config('commissioning_osystem')
             series = Config.objects.get_config('commissioning_distro_series')
+ subarch = "generic"
+ machine.architecture = '{}/{}'.format(arch, subarch)
+ machine.hwe_kernel = None
         else:
             osystem = machine.get_osystem()
             series = machine.get_distro_series()

Related branches

Andres Rodriguez (andreserl) wrote :

To reproduce:

1. Deploy Artful
2. Release with disk erasing
3. The machine will attempt to boot the artful kernel with a Xenial image:

ubuntu/amd64/ga-17.10/xenial

We need to determine whether:

1. Since it was deployed with artful, boot the kernel/image same as the deployment
2. Use the default commissioning kernel/image.

summary: - maas 2.2.2 unable to disk erase artful deployment
+ [2.3+] maas 2.2.2 unable to disk erase artful deployment
Changed in maas:
milestone: none → 2.4.0beta3
milestone: 2.4.0beta3 → 2.4.0beta2
importance: Undecided → High
status: New → Triaged
summary: - [2.3+] maas 2.2.2 unable to disk erase artful deployment
+ [2.3+] Unable to disk erase if machine is deployed with a non-lts kernel
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Changed in maas:
status: Fix Released → Fix Committed
Jason Hobbs (jason-hobbs) wrote :

I'm hitting this in 2.3.3:

Marking node failed - Missing boot image ubuntu/arm64/ga-18.04/xenial.

Changed in maas:
status: Fix Committed → Fix Released
Joe Julian (joe.julian) wrote :

Can this be backported to 2.3?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers