ci - partition job failures

Bug #2057972 reported by Julia Kreger
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Medium
Unassigned

Bug Description

The partition jobs can fail in odd ways, specifically erroring suggesting /bin/sh is missing in the image:

Mar 14 19:07:16.681311 np0037067775 nova-compute[104237]: ERROR nova.compute.manager [None req-57a35be4-48f3-4686-816f-77efd6b28f41 tempest-BaremetalBasicOps-1051466482 tempest-BaremetalBasicOps-1051466482-project-member] [instance: a2cab0c6-f76d-458a-824d-1526135d7500] Failed to build and run instance: nova.exception.InstanceDeployFailure: Failed to provision instance a2cab0c6-f76d-458a-824d-1526135d7500: Deploy step deploy.prepare_instance_boot failed: Failed to install a bootloader when deploying node e6fca44d-4428-4a93-a9ea-43929f1d9347: Installing GRUB2 boot loader to device /dev/vda failed with Unexpected error while running command.
Mar 14 19:07:16.681311 np0037067775 nova-compute[104237]: Command: chroot /tmp/tmph9x4wezd /bin/sh -c "mount -a -t vfat"
Mar 14 19:07:16.681311 np0037067775 nova-compute[104237]: Exit code: 127

We believe this is rooted in the way the job's base image is actually created by booting the cirros kernel and ramdisk, and creating a partition image from the resulting disk contents, as cirros's disk image is actually blank by default, and it is the only way to "capture" the contents. The current believe is that this is something to do with that process resulting in /bin/sh missing, which causes the CI job to fail.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic (master)

Reviewed: https://review.opendev.org/c/openstack/ironic/+/913270
Committed: https://opendev.org/openstack/ironic/commit/10ebbe74dad92647746bfe7d36f69175f0df3146
Submitter: "Zuul (22348)"
Branch: master

commit 10ebbe74dad92647746bfe7d36f69175f0df3146
Author: Iury Gregory Melo Ferreira <email address hidden>
Date: Thu Mar 14 18:36:47 2024 -0300

    Tempest test with only wholedisk for some jobs

    Changing the ironic-tempest-uefi-redfish-vmedia and
    ironic-tempest-ovn-uefi-ipmi-pxe jobs to only run
    tempest test_baremetal_server_ops_wholedisk_image.

    We saw failures on the partition tests for this jobs.

    Related-Bug: #2057972
    Change-Id: I2e26d7955ade11046bf89b6f4c9c2c4f16da1574

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

This should largely be resolved by https://review.opendev.org/c/openstack/ironic/+/914772 which unpacks the cirros images before uploading them.

Changed in ironic:
status: New → Triaged
importance: Undecided → Medium
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.