Image-based Fedora 26 failing to build

Bug #1713381 reported by Ian Wienand
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
diskimage-builder
New
Undecided
Unassigned

Bug Description

Building Fedora 26 (image based, not fedora-minimal) fails reliably with

---
2017-08-28 00:27:59.462307 | [fedora/build-succeeds] dib-run-parts Mon Aug 28 00:27:59 UTC 2017 Running /tmp/in_target.d/finalise.d/01-clean-old-kernels
...
2017-08-28 00:27:59.464828 | [fedora/build-succeeds] ++ dnf repoquery --installonly --latest-limit=-1
...
2017-08-28 00:28:01.408018 | [fedora/build-succeeds] Could not determine your machine ID from /etc/machine-id.
2017-08-28 00:28:01.408055 | [fedora/build-succeeds] Please run 'systemd-machine-id-setup' as root. See man:machine-id(5)
2017-08-28 00:28:01.408095 | [fedora/build-succeeds] error: %preun(kernel-core-4.11.8-300.fc26.x86_64) scriptlet failed, exit status 1
---

I've chased this down to I believe [1]. The difference here is that before, that script used to short-circuit the "/etc/machine-id" checks in kernel-install ... now it is a plugin and those checks happen before the plugin is called, leading to this failure.

What to do ... I'm not sure right now :/

[1] https://src.fedoraproject.org/rpms/systemd/c/12da227455a6872d695cdcac1093b6b5fe9f9008?branch=master

Revision history for this message
Ian Wienand (iwienand) wrote :

Two things; firstly F25 build seems to have a machine-id

---
[root@dib-xenial /]# cat /etc/machine-id
afcaeb9a88f5444691008ddc87a43402
---

but also works without a machine-id

---
[root@dib-xenial /]# dnf repoquery --installonly --latest-limit=-1 -q
kernel-core-0:4.8.6-300.fc25.x86_64
[root@dib-xenial /]# dnf remove -y kernel-core-0:4.8.6-300.fc25.x86_64
Dependencies resolved.
===============================================================================================================================================================================================================
 Package Arch Version Repository Size
===============================================================================================================================================================================================================
Removing:
 kernel-core x86_64 4.8.6-300.fc25 @anaconda 52 M

Transaction Summary
===============================================================================================================================================================================================================
Remove 1 Package

Installed size: 52 M
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
grubby fatal error: unable to find a suitable template
grubby: doing this would leave no kernel entries. Not writing out new config.
grubby fatal error: unable to find a suitable template
grubby: doing this would leave no kernel entries. Not writing out new config.
  Erasing : kernel-core-4.8.6-300.fc25.x86_64 1/1
warning: file /lib/modules/4.8.6-300.fc25.x86_64/updates: remove failed: No such file or directory
  Verifying : kernel-core-4.8.6-300.fc25.x86_64 1/1

Removed:
  kernel-core.x86_64 4.8.6-300.fc25

Complete!
---

Revision history for this message
Ian Wienand (iwienand) wrote :

Urgh, so looking at [1] and local testing -- I guess this fails on install time too, but dnf doesn't return an error code then. It's only on the cleanup path, when we're just uninstalling the one kernel, that it fails and then breaks the build

Hacking in a /etc/machine-id might be the solution here

I have filed [2]

[1] http://logs.openstack.org/34/497734/3/check/gate-dib-dsvm-functests-python2-ubuntu-trusty-image-nv/b9cf382/console.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1486124

Revision history for this message
Ian Wienand (iwienand) wrote :

Ok, so I've determined a few things

Firstly, we are always creating /etc/machine-id with a systemd-machineid-setup call in the %post phase of the spec

This explains why we don't see it on Fedora 26 now as a new release ... for this short period of time, systemd hasn't been updated in Fedora 26 and so isn't being re-installed when using the image-based builds.

So, we have an additional problem that we're leaving machine-id in final image. This can be verified by looking at basically all the infra nodes which end up using the same machine-id. This has actually come up in [2] and I think we will need to take that approach. I've been testing that and will see how we go

[1] https://src.fedoraproject.org/rpms/systemd/blob/master/f/systemd.spec#_464
[2] https://review.openstack.org/#/c/489013

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.