Custom images which worked ok is not working with 3.2

Bug #2020397 reported by Seyeong Kim
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Adam Collard
3.1
Fix Released
High
Mauricio Faria de Oliveira
3.2
Fix Released
High
Adam Collard
3.3
Fix Released
High
Adam Collard
3.4
Fix Released
High
Adam Collard

Bug Description

The customer said after upgrading to maas 3.2, they can't use their custom images anymore.

The customer analyzed a bit about this and get_custom_image_dependency_validation introduced this issue.

The error is below [1]

I'm trying to test this as well but takes time. In the mean time, I would appreciate if you can have any kind of advices.

If they bypass the validation function, it works.

I'll reinforce the info after testing.

Thanks.

[1]

finish: cmd-install/stage-late/98-validate-custom-image-has-cloud-init/cmd-in-target: FAIL: curtin command in-target
curtin: Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'in-target', '--', 'bash', '-c', 'dpkg-query -s cloud-init || (echo "cloud-init not detected, MAAS will not be able to configure this machine properly" && exit 1)']
Exit code: 1
Reason: -
Stdout: start: cmd-install/stage-late/98-validate-custom-image-has-cloud-init/cmd-in-target: curtin command in-target
Running command ['mount', '--bind', '/dev', '/tmp/tmpx2g387o1/target/dev'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/proc', '/tmp/tmpx2g387o1/target/proc'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/run', '/tmp/tmpx2g387o1/target/run'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/sys', '/tmp/tmpx2g387o1/target/sys'] with allowed return codes [0] (capture=False)
Running command ['unshare', '--help'] with allowed return codes [0] (capture=True)
Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpx2g387o1/target', 'bash', '-c', 'dpkg-query -s cloud-init || (echo "cloud-init not detected, MAAS will not be able to configure this machine properly" && exit 1)'] with allowed return codes [0] (capture=False)
bash: dpkg-query: command not found
cloud-init not detected, MAAS will not be able to configure this machine properly

Related branches

Seyeong Kim (seyeongkim)
tags: added: sts
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Workaround: (bypass `dpkg-query` for the custom image validation commands)

Edit `curtin_userdata_custom`:
- On debs, it's in `/etc/maas/preseeds/`.
- On snaps, in `/var/snap/maas/current/preseeds/` copy the file
  `curtin_userdata_custom.sample` as `curtin_userdata_custom`.

Add the following 2 lines to `late_commands` (indented with 2 spaces):

  00-validate-custom-image-aaa-workaround: ['curtin', 'in-target', '--', '/bin/sh', '-c', 'ln -s $(which true) /usr/local/bin/dpkg-query']
  99-validate-custom-image-zzz-workaround: ['curtin', 'in-target', '--', '/bin/sh', '-c', 'rm -f /usr/local/bin/dpkg-query']

...

Before:

 ... cloud-init[1605]: finish: cmd-install/stage-late/98-validate-custom-image-has-cloud-init: FAIL: running 'curtin in-target -- bash -c dpkg-query -s cloud-init || (echo "cloud-init not detected, ..." && exit 1)'
 ...
 ... cloud-init[1605]: curtin: Installation failed with exception: Unexpected error while running command.

After:

 ... cloud-init[1636]: finish: cmd-install/stage-late/00-validate-custom-image-aaa-workaround: SUCCESS: running 'curtin in-target -- /bin/sh -c ln -s $(which true) /usr/local/bin/dpkg-query'
 ...
 ... cloud-init[1636]: finish: cmd-install/stage-late/98-validate-custom-image-has-cloud-init: SUCCESS: running 'curtin in-target -- bash -c dpkg-query -s cloud-init || (echo "cloud-init not detected, ..." && exit 1)'
 ...
 ... cloud-init[1636]: finish: cmd-install/stage-late/99-validate-custom-image-has-netplan.io: SUCCESS: running 'curtin in-target -- bash -c dpkg-query -s netplan.io || (echo "netplan.io not detected, ..." && exit 1)'
 ...
 ... cloud-init[1636]: finish: cmd-install/stage-late/99-validate-custom-image-zzz-workaround: SUCCESS: running 'curtin in-target -- /bin/sh -c rm -f /usr/local/bin/dpkg-query'
 ...
 ... cloud-init[1636]: curtin: Installation finished.

Revision history for this message
Alan Baghumian (alanbach) wrote (last edit ):

The issue is, currently from MAAS' perspective, a custom image really is a custom version of the supported Operating Systems - such as CentOS, RHEL and Ubuntu.

This was not the case with MAAS 2.9 and older, which let clients upload anything they wanted - ToasterOS FWIW - as "custom" and let MAAS happily deploy them.

This has changed since MAAS 3.0 with the introduction of extra validations for "custom" images, which kind of kills the entire point of having the "custom" option available.

The above hack works and lets you bypass the check when a custom image is uploaded while omitting the base_image parameter, but I ultimately believe this is something that needs to be fixed / reverted in the source code.

I understand we are not in a position of dictating any development policies to the MAAS team - but at the end of the day is is not a big ask and is simply something that will make a lot of users who rely on the custom image functionality much happier.

Thank you.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

IMHO, this doesn't have to be _reverted_, and likely shouldn't,
as it's reasonable for maas to validate that the deployed image
has the tooling maas expects, for correct operation after boot.

_However_, it's also reasonable to allow users to tell maas not
to care about it, as there are scenarios where that's legitimate
usage (I loved the ToasterOS example! anA there are cases which
don't even require/have a package manager available for reasons).

The thing is, the number of use cases for the latter is way lower
than the former, AFAICT, so it seems like a 'compromise solution'
is to provide such an option (do not validate some custom images)
to the users who/images which need them, and not to revert checks
that may indeed help other users (who could even expect them, for
rigor, to avoid booting into non-configurable systems) later on.

tags: added: bug-council
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I tried to upload patch just ignoring verification.
After I analyzed further, found out that custom image has base field in boot-resource as ubuntu/focal by default.

so it runs dpkg-query.

I'm off until tomorrow. I think we need to set default base as custom (sort of) not ubuntu/focal from commisioning conf.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Seyeong, I looked a bit into that option, but I'm not sure that changing the base image to 'custom' would work, as it is used as the ephemeral OS used to deploy the image [1], and must be a simplestreams/non-custom image [2].

[1] https://maas.io/docs/about-images#heading--about-how-maas-handles-these-images

"The base_image field is used to select the appropriate version of the ephemeral OS to avoid errors."

[2] https://maas.io/docs/about-images#heading--how-images-deploy

"This ephemeral OS must be one of the images supplied by the MAAS simplestreams and cannot be a custom OS image.
... This ephemeral OS is not deployed, but it is used to download and deploy the image you’ve chosen for the machine."

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

I proposed another approach on MR [1], aligned with the rationale in comment #3.

[1] https://code.launchpad.net/~mfo/maas/+git/maas/+merge/444127

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Thanks for the checking. In that case, only way is bypassing it as your merge proposal i guess.

Revision history for this message
Alan Baghumian (alanbach) wrote :

I was curious so did a few tests to see how the deployment of custom images behave in the sense of the base_image database field. I used my custom CentOS 7.9 image for this test.

The image was imported using the following command:

$ maas homelab boot-resources create name='custom/centos7.9-2023-04-10-01' title='CentOS 7.9 Custom 2023-05-28-02' architecture='amd64/generic' filetype='tgz' content@=centos7.tar.gz

Here is what was tested:

1. First things first, I disabled the late_commands workaround lines, performed a test deployment and verified that the dpkg-query error was back.

late_commands:
  #00-validate-custom-image-aaa-workaround: ['curtin', 'in-target', '--', '/bin/sh', '-c', 'ln -s $(which true) /usr/local/bin/dpkg-query']
  #99-validate-custom-image-zzz-workaround: ['curtin', 'in-target', '--', '/bin/sh', '-c', 'rm -f /usr/local/bin/dpkg-query']

2. Updated the database, replaced the base_image field content with 'custom'. Was not able to initiate the deployment. It errors right on the WebUI.

maasdb=# update maasserver_bootresource set base_image='custom' where id=1018;

3. Updated the database, replaced the base_image field content with 'custom/custom'. Similar to #2, I was not able to initiate the deployment. It errors right on the WebUI.

maasdb=# update maasserver_bootresource set base_image='custom/custom' where id=1018;

4. Updated the database, blanked the base_image field content. The deployment started and ended with the same dpkg-query error.

maasdb=# update maasserver_bootresource set base_image='' where id=1018;

Changed in maas:
status: New → Won't Fix
status: Won't Fix → Confirmed
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Is this issue reproducible when the custom CentOS image is uploaded with base_image="centos"? Can we see more information about the image that causes this issue to appear? The output of `maas $profile boot-resources read` would be very useful, as well as the dump of maasserver_bootresource DB table.

Revision history for this message
Alan Baghumian (alanbach) wrote :

@Jerzy,

I just ran a couple of quick tests:

1. Setting base_image to 'centos' does not allow deploying (See screenshot for the GUI error)

2. Setting the base_image to 'centos/7' starts the deployment, however the machine keeps rebooting at the "Configuring OS" stage. This keeps looping until you abort.

Revision history for this message
Alan Baghumian (alanbach) wrote :

Please see the database contents here: https://pastebin.ubuntu.com/p/ZVsNNFtrqD/

maasdb=# select * from maasserver_bootresource;

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I followed Alan's step and while deploying it faced issue here

https://pastebin.ubuntu.com/p/3hQpMwt5Ck/

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Thanks Alan. Could you try uploading the centos image once more, this time with base_image set to centos/centos70 (and not centos/7)?

So far, to MAAS it looks like the custom CentOS 7.9 image is not rhel-like, but debian-like, which leads to the issues reported above. Either there's an issue in how MAAS recognizes the type of the distro, or the customizations of the image confuse MAAS to think it's something else.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

hey @jhusakowski

the result is below

maas xtrusia boot-resources create name='centos/custom' title='centos custom' architecture='amd64/generic' filetypes='tgz' content@=centos7.tar.gz base_image="centos/centos70"

https://pastebin.ubuntu.com/p/VBKr5BPzr9/

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Could you share how you build the centos7.tar.gz file, and perhaps the file itself? It looks like curtin has trouble figuring out that the image is actually CentOS.

Revision history for this message
Alan Baghumian (alanbach) wrote :

Hi @jhusakowski

So sorry about the delay on this test. I've been very busy.

I just manually updated the database record:

maasdb=# update maasserver_bootresource set base_image='centos/70' where id=1018;
UPDATE 1

Then performed a deployment and it worked like a charm!!

Best,
Alan

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

We will try to reproduce this locally, at this stage it's not clear why it works with 'centos/70' but doesn't with 'centos/centos70'.

It would help us if you can share your image or the build recipe used to create it.

Changed in maas:
importance: Undecided → Medium
milestone: none → 3.5.0
status: Confirmed → Triaged
Revision history for this message
Alberto Donato (ack) wrote :

I've tried to upload the official MAAS Centos 7 image as a custom image with the following command:

maas local boot-resources create name='centos/custom' title='centos custom' architecture='amd64/generic' filetypes='tgz' content@=root-tgz base_image="centos/centos70"

Deployment works fine.

Note that the name has to be "centos/<something>", and base_image should be "centos/centos70"

Revision history for this message
Alan Baghumian (alanbach) wrote :

Hi Jerzy,

I used the standard packer-maas process:

$ git clone <email address hidden>:canonical/packer-maas.git
$ cd centos7
$ sudo PACKER_LOG=1 packer build centos7.json

Then used the produced centos7.tar.gz to import to MAAS.

Best,
Alan

Revision history for this message
Alan Baghumian (alanbach) wrote :

Let's focus on the get_custom_image_dependency_validation and not get distracted.

Any updates regarding the merge request submitted by Mauricio that was proposed to solve this issue?

Thank much,
Alan

(1) https://code.launchpad.net/~mfo/maas/+git/maas/+merge/444127

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Alan,

If I may help clarify.

That MR was rejected (as in this bug's header); technical details are in the internal chat referenced in our ticket system.

The code changes are a parallel thread with the requests for information from the MAAS team, which don't seem to be a distraction, but probably collect insight required for analyzing the request and options.

I should soon resume the other parallel thread of code change proposals.

Thanks for the reminder.
Mauricio

Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

Hi @alan: How exactly did you upload the image that you created?
We will also try to reproduce this bug on 3.3 in the meantime and check if the instructions on packer-maas are accurate.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

see comment above

Alberto Donato (ack)
Changed in maas:
milestone: 3.5.0 → 3.3.x
no longer affects: maas/3.4
Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Alberto Donato (ack) wrote :

I can't reproduce the issue with 3.3 (via snap from 3.3/stable).

I've built a custom image with packer-maas with the same steps as described above, and uploaded with the same command as comment #18.
Deployment worked fine.

Changed in maas:
milestone: 3.3.x → 3.2.x
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Alberto,

Thanks for looking at this and testing.

I think the issue reported is not necessarily particular to the recent tests performed, i.e., a custom image might not be centos-based, or built w/ packer-maas (iirc the maas docs mention other ways can be used), or have one of the package manager commands at all (eg, chiselled or locked down deployments).

I can appreciate the value of the other point being discussed/tested, but just would like to clarify that there is still an issue / regression of such images, which used to deploy fine prior to 3.2, and that ideally we would have a way for power users to 'hint' some specific validation is known to fail with their image (or provide their own), and not fail the deployment due to that, if at all possible.

I'll submit another MR that hopefully is succinct enough, but I realize all this is involved in a lot more than just source code changes.

Thanks again,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi MAAS bug-council,

Could you please consider this last attempt for a code change
to help users w/ custom images without a package manager? [1]

It's certainly understandable if the MAAS direction is not to
allow this _at all_, but in that case, we likely should have
something for such users in our docs (I can send patches too).

Please let me know your thoughts / next steps to take.

Thanks,
Mauricio

[1] https://code.launchpad.net/~mfo/maas/+git/maas/+merge/446216

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

MAAS requires deployed images to have cloud-init capability at the minimum, and on ubuntu-based distros also netplan. The current way it checks for this is unintentionally fragile - query for a package using assumed package manager. We will improve the robustness of how it checks for the required dependencies.

Removing the check, or allowing custom overrides, makes troubleshooting of deployment issues harder. Custom images for MAAS that are built with packer-maas will work, others may not and we can't offer any guarantees that arbitrary images will smoothly deploy.

tags: removed: bug-council
Changed in maas:
status: Triaged → Fix Committed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Jerzy, Adam,

Thanks for the clarification and code changes in this regard.

cheers,
Mauricio

Changed in maas:
milestone: 3.2.x → 3.5.0
assignee: nobody → Adam Collard (adam-collard)
Changed in maas:
importance: Medium → High
Revision history for this message
Adam Collard (adam-collard) wrote :

Just to clarify, we will still validate the presence of cloud-init (and in the case of Ubuntu based images, netplan) and now do that by executing them inside the ephemeral environment (cloud-init --version, netplan info). This should be a more robust check, and avoids the need for a package manager.

Revision history for this message
Nicholas Fries (nicfries) wrote :

FYI - 3.2.9 broke our custom images, due to the netplan check being introduced. Some of our custom images are not Ubuntu based and we use curtin-hooks to bridge the gap.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hello Alan, Seyeong, or anyone else affected,

Accepted maas 3.1.3 into ppa:maas/3.1-next and snap:3.1/edge
(code version 3.1.3-10930-g.2eb4e7525).

Please test this update and provide your feedback on this bug:

If it fixes the bug for you, please add a comment mentioning the version you tested and what testing has been performed, and change the tag from verification-needed-maas-3.1 to verification-done-maas-3.1.

If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-maas-3.1.

The update will be released after the bug(s) have been verified and quality assurance testing is successful.

Thank you in advance for helping!

tags: added: verification-needed-maas-3.1
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (4.5 KiB)

Verification done with MAAS 3.1.3.

Deployment of CentOS7 image from images.maas.io uploaded
as custom image with field base_image='centos/centos70'.

On MAAS 3.1.2: only cloud-init is checked via `rpm -q cloud-init`
On MAAS 3.1.3: only cloud-init is checked via `cloud-init --version`

On both, netplan is not checked (correct as base_image is `centos/*`).

MAAS 3.1.2:

 $ snap list maas
 Name Version Rev Tracking Publisher Notes
 maas 3.1.2-10926-g.5ad7060e5 30495 3.1/stable canonical✓ -

 $ maas admin version read | jq '.version, .subversion'
 "3.1.2"
 ""

 $ wget https://images.maas.io/ephemeral-v3/stable/centos/centos70/amd64/20240128_01/root-tgz

 $ file root-tgz
 root-tgz: gzip compressed data, was "centos7.tar", last modified: Sun Jan 28 12:08:28 2024, max compression, from Unix, original size modulo 2^32 1307934720

 $ maas admin boot-resources create \
   name='custom/custom_centos70' \
   title='custom centos70' \
   base_image='centos/centos70' \
   architecture='amd64/generic' \
   filetypes='tgz' \
   content@=./root-tgz

 $ maas admin boot-resources read | jq -r 'map(select(.name == "custom_centos70"))'
 [
   {
     "id": 12,
     "type": "Uploaded",
     "name": "custom_centos70",
     "architecture": "amd64/generic",
     "resource_uri": "/MAAS/api/2.0/boot-resources/12/",
     "subarches": "generic",
     "title": "custom centos70"
   }
 ]

 $ maas admin machines read | jq -r '.[] | .system_id'
 qdccrf

 $ maas admin machine deploy qdccrf distro_series=custom_centos70

  [ 183.654526] cloud-init[1648]: start: cmd-install/stage-late: executing late commands
  [ 183.671299] cloud-init[1648]: start: cmd-install/stage-late/98-validate-custom-image-has-cloud-init: running 'curtin in-target -- bash -c rpm -q cloud-init || (echo "cloud-init not detected, MAAS will not be able to configure this machine properly" && exit 1)'
  ...
  [ 184.183081] cloud-init[1648]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpmzjqkdvw/target', 'bash', '-c', 'rpm -q cloud-init || (echo "cloud-init not detected, MAAS will not be able to configure this machine properly" && exit 1)'] with allowed return codes [0] (capture=False)
  [ 184.235292] cloud-init[1648]: cloud-init-19.4-7.el7.centos.6.x86_64
  ...
  [ 184.457271] cloud-init[1648]: finish: cmd-install/stage-late/98-validate-custom-image-has-cloud-init: SUCCESS: running 'curtin in-target -- bash -c rpm -q cloud-init || (echo "cloud-init not detected, MAAS will not be able to configure this machine properly" && exit 1)'
  ...
  [ 185.438936] cloud-init[1648]: finish: cmd-install/stage-late: SUCCESS: executing late commands

 $ maas admin machines read | jq -r '.[] | .system_id, .status_name'
 qdccrf
 Deployed

 $ maas admin machine release qdccrf

 $ maas admin machines read | jq -r '.[] | .system_id, .status_name'
 qdccrf
 Ready

MAAS 3.1.3:

 $ sudo snap refresh --channel=3.1/edge maas

 $ snap list maas
 Name Version Rev Tracking Publisher Notes
 maas 3.1.3-10930-g.2eb4e7525 33606 3.1/edge canonical✓ -

 $ maas admin version read | jq '.version, .subversion'
 "3.1.3"
 ""

 $ maas admin machine deploy q...

Read more...

tags: added: verification-done-maas-3.1
removed: verification-needed-maas-3.1
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

MAAS 3.1.3 has been released:
- deb: ppa:maas/3.1 (1:3.1.3-10930-g.2eb4e7525-0ubuntu1~20.04.1)
- snap: 3.1/stable (3.1.3-10930-g.2eb4e7525)

Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.