Unable to find boot image when erasing disks when releasing

Bug #1705212 reported by Dongwon Cho on 2017-07-19
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
High
Unassigned
2.2
High
Unassigned

Bug Description

Unable to find boot image for a node that was deployed with a kernel image provided by the previous image source(ephemeral-v2) when erasing disks when releasing. As it is shown below, it fails looking for the old boot image with which it was deployed.

cat /var/log/maas/maas.log
Jul 19 06:43:06 maas maas.node: [info] h12: Status transition from DEPLOYED to DISK_ERASING
Jul 19 06:43:06 maas maas.power: [info] Changing power state (on) of node: h12 (4y3h7x)
Jul 19 06:43:06 maas maas.node: [info] h12: Disk erasure started
Jul 19 06:43:31 maas maas.power: [info] Changed power state (on) of node: h12 (4y3h7x)
Jul 19 06:49:19 maas maas.node: [info] h12: Status transition from DISK_ERASING to FAILED_DISK_ERASING
Jul 19 06:49:19 maas maas.node: [error] h12: Marking node failed: Missing boot image ubuntu/amd64/hwe-x/xenial.
Jul 19 06:49:42 maas maas.node: [info] h12: Status transition from FAILED_DISK_ERASING to DISK_ERASING
Jul 19 06:49:42 maas maas.power: [info] Changing power state (on) of node: h12 (4y3h7x)
Jul 19 06:49:42 maas maas.node: [info] h12: Disk erasure started
Jul 19 06:50:07 maas maas.power: [info] Changed power state (on) of node: h12 (4y3h7x)
Jul 19 06:52:45 maas maas.node: [info] h12: Status transition from DISK_ERASING to FAILED_DISK_ERASING
Jul 19 06:52:45 maas maas.node: [error] h12: Marking node failed: Missing boot image ubuntu/amd64/hwe-x/xenial.
Jul 19 06:54:22 maas maas.node: [info] h12: Status transition from FAILED_DISK_ERASING to DISK_ERASING
Jul 19 06:54:22 maas maas.power: [info] Changing power state (on) of node: h12 (4y3h7x)
Jul 19 06:54:22 maas maas.node: [info] h12: Disk erasure started
Jul 19 06:54:47 maas maas.power: [info] Changed power state (on) of node: h12 (4y3h7x)
Jul 19 06:57:21 maas maas.node: [info] h12: Status transition from DISK_ERASING to FAILED_DISK_ERASING
Jul 19 06:57:21 maas maas.node: [error] h12: Marking node failed: Missing boot image ubuntu/amd64/hwe-x/xenial.

dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.1.5+bzr5596-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.5+bzr5596-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.5+bzr5596-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.5+bzr5596-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Changed in maas:
status: New → Won't Fix
status: Won't Fix → Incomplete
Andres Rodriguez (andreserl) wrote :

Hi There,

Starting from Xenial, the kernel naming has changed and a 'hwe-x' no longer exists. That, however, means that you need to use 'ephemeral-v3' as a stream, and MAAS 2.1+ requires you to use it. That provides the ga-16.04 kernel (or hwe-16.04). That said, I have a few questions:

1. Are you using the ephemeral-v3 stream or ephemeral-v2?
2. Was this machine deployed in a previous version of MAAS, you upgraded MAAS, and now trying to release the machine?

Dongwon Cho (dongwoncho) wrote :

Hi,

1. Yes, I am using ephemeral-v3.
2. Exact!

I guess there will be no problem for users who began to use MAAS from 2.1+.

Always thanks

Trent Lloyd (lathiat) wrote :

I am hitting this issue on MAAS 2.2 only when "disk erasing". It tries to boot from ubuntu/amd64/hwe-t/xenial/daily/boot-kernel.

This seems really strange as I don't think a hwe-t would ever have existed for xenial?

This install has been upgraded I think since MAAS 1.9 but definitely since at least 2.0 then through 2.1 onto 2.2 including some beta/alpha versions.

Deploy & Commission are set to 16.04 GA (no HWE). Commission is successful and uses the ga-16.04/ path. Seems only disk erase (quick) is affected.

Looking through pg_dump super-naively with grep, I can see some related entries. Please advise if you want any more specific information.

58 2016-11-21 10:49:02.45629+08 2017-07-25 13:44:50.176052+08 0 ubuntu/xenial amd64/ga-16.04 {"subarches": "generic,hwe-p,hwe-q,hwe-r,hwe-s,hwe-t,hwe-u,hwe-v,hwe-w,ga-16.04"} generic \N f

1169163 2017-07-25 13:44:43.824057+08 2017-07-25 13:44:43.824057+08 ubuntu amd64 hwe-t xenial daily 1 Xenial Xerus 16.04 LTS 2021-04-21 lowlatency \N {}

1577782 2017-07-25 13:37:16.478143+08 2017-07-25 13:37:16.478143+08 ubuntu/amd64/hwe-t/xenial/daily/boot-kernel 243 10

(there are more lines, just including some examples)

Changed in maas:
status: Incomplete → Confirmed
Trent Lloyd (lathiat) wrote :

OK more details:
 - My Deploy & Commission kernels are set to 16.04 (GA kernel) in settings
 - Releasing a machine deployed as xenial machine works
 - Releasing a machine deployed as trusty fails, trying to boot xenial but under the hwe-t directory

So I suspect somewhere the code is confusing the deployed kernel with the default commissioning kernel for the path generation.

Changed in maas:
milestone: none → 2.3.0
importance: Undecided → High
Changed in maas:
milestone: 2.3.0 → 2.3.x
Daniel Souza (danielsouzasp) wrote :

Hello guys, I'm using MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1) and I'm facing the same problem here, I have to run this command to use the node "cd /var/lib/maas/boot-resources/current/ubuntu/amd64/hwe-t ; ln -s trusty xenial"

will this be fixed in next version?

Thank you so much!

Andres Rodriguez (andreserl) wrote :

**This is an automated message**

We believe this issue has now been fixed in the latest MAAS release. As such, we are marking this as Fix Released. If you believe this is still an issue, please re-open this bug report and provide any relevant information.

Changed in maas:
milestone: 2.3.x → 2.4.x
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers