bootloader element fails to install grub on ppc64 platform

Bug #1674402 reported by Mikhail S Medvedev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
diskimage-builder
Fix Committed
Undecided
Unassigned

Bug Description

After 2.0.0 release of diskimage-builder the bootloader element is no longer able to run grub-install on ppc64:

2017-03-20 14:20:40,063 INFO nodepool.image.build.devstack-xenial: ++ /usr/sbin/grub-install --version
2017-03-20 14:20:40,064 INFO nodepool.image.build.devstack-xenial: + [[ /usr/sbin/grub-install (GRUB) 2.02~beta2-36ubuntu3.8 =~ 2\. ]]
2017-03-20 14:20:40,064 INFO nodepool.image.build.devstack-xenial: + '[' -d /sys/firmware/efi ']'
2017-03-20 14:20:40,064 INFO nodepool.image.build.devstack-xenial: + [[ ppc64el =~ ppc ]]
2017-03-20 14:20:40,065 INFO nodepool.image.build.devstack-xenial: + /usr/sbin/grub-install --modules=part_msdos --force /dev/loop0 --no-nvram
2017-03-20 14:20:40,111 INFO nodepool.image.build.devstack-xenial: Installing for powerpc-ieee1275 platform.
2017-03-20 14:20:47,956 INFO nodepool.image.build.devstack-xenial: /usr/sbin/grub-install: warning: unknown device type loop0p1
2017-03-20 14:20:47,956 INFO nodepool.image.build.devstack-xenial: .
2017-03-20 14:20:48,026 INFO nodepool.image.build.devstack-xenial: /usr/sbin/grub-install: error: the chosen partition is not a PReP partition.

While building with 1.2.8:

2017-03-20 07:31:55,355 INFO nodepool.image.build.devstack-xenial: ++ /usr/sbin/grub-install --version
2017-03-20 07:31:55,357 INFO nodepool.image.build.devstack-xenial: + [[ /usr/sbin/grub-install (GRUB) 2.02~beta2-36ubuntu3.8 =~ 2\. ]]
2017-03-20 07:31:55,357 INFO nodepool.image.build.devstack-xenial: + '[' -d /sys/firmware/efi ']'
2017-03-20 07:31:55,357 INFO nodepool.image.build.devstack-xenial: + [[ ppc64el =~ ppc ]]
2017-03-20 07:31:55,358 INFO nodepool.image.build.devstack-xenial: + /usr/sbin/grub-install --modules=part_msdos --force /dev/mapper/loop0p1 --no-nvram
2017-03-20 07:31:55,404 INFO nodepool.image.build.devstack-xenial: Installing for powerpc-ieee1275 platform.
2017-03-20 07:32:02,527 INFO nodepool.image.build.devstack-xenial: Installation finished. No error reported.

One change apparent from the logs is that loop device name is different:

/usr/sbin/grub-install --modules=part_msdos --force /dev/loop0 --no-nvram # from failed build
/usr/sbin/grub-install --modules=part_msdos --force /dev/mapper/loop0p1 --no-nvram # from successful build

Tags: ppc64
Revision history for this message
Mikhail S Medvedev (msmedved) wrote :

Adding builder log for a successful build with dib 1.28.0

Revision history for this message
Ian Wienand (iwienand) wrote :

I think I see the broad strokes of what's gone wrong here. In vm/block-device.d/10-partition we had a branch for ppc which sets up two partitions; this PReP boot partition and the main one. I don't see that we've migrated this correctly to the new bootloader.

I think this is just a matter of setting different default bootloader configs for ppc

[1] https://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/vm/block-device.d/10-partition?h=1.28.0#n13

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to diskimage-builder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/447739

Changed in diskimage-builder:
assignee: nobody → Ian Wienand (iwienand)
status: New → In Progress
Revision history for this message
Ian Wienand (iwienand) wrote :
Download full text (3.3 KiB)

I haven't had much luck with a local replication of this.

I've applied [1] which seems pretty good, but unfortunately, it seems like qemu doesn't like working with ppc64le binaries and fails with an invalid exception [2]

---
I: Running command: chroot /opt/tmp/dib_build.0e0oTMxc/mnt /debootstrap/debootstrap --second-stage
Invalid instruction
NIP 000000400087115c LR 0000004000849ea4 CTR 000000000000000a XER 0000000000000000 CPU#0
MSR 8000000002806001 HID0 0000000000000000 HF 0000000002806001 idx 0
TB 00769329 3304244275513403
GPR00 00000040008453b8 0000004000843010 000000400089be00 0000004000843070
GPR04 0000000000000000 0000000000000280 0000000000000000 0000000000000000
GPR08 000000000000000a 0000004000843070 0000000000000000 0000000000000000
GPR12 0000004000843070 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 00000040008435c0 0000000000000000 0000000000000000
CR 40000002 [ G - - - - - - E ] RES ffffffffffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 0000000000000000
qemu: uncaught target signal 4 (Illegal instruction) - core dumped
Illegal instruction (core dumped)
---

This might be worth a bug, Mikhail?

This is what I was using and got that far ...

---
DIB_DISTRIBUTION_MIRROR=http://ports.ubuntu.com/ \
  ARCH=ppc64el \
  TMPDIR=/opt/tmp
  DIB_IMAGE_CACHE=/opt/image-cache \
    disk-image-create -x -o test.qcow2 ubuntu-minimal vm
---

I *think* I've got all the qemu bits installed correctly

---
(env) ubuntu@dib-xenial:~/diskimage-builder$ dpkg --list | grep qemu
ii ipxe-qemu 1.0.0+git-20150424.a25a16d-1ubuntu1 all PXE boot firmware - ROM images for qemu
ii qemu-block-extra:amd64 1:2.5+dfsg-5ubuntu10.9 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-slof 20151103+dfsg-1ubuntu1 all Slimline Open Firmware -- QEMU PowerPC version
ii qemu-system-common 1:2.5+dfsg-5ubuntu10.9 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-ppc 1:2.5+dfsg-5ubuntu10.9 amd64 QEMU full system emulation binaries (ppc)
ii qemu-user 1:2.5+dfsg-5ubuntu10.9 amd64 QEMU user mode emulation binaries
ii qemu-user-static 1:2.5+dfsg-5ubuntu10.9 amd64 ...

Read more...

Revision history for this message
Mikhail S Medvedev (msmedved) wrote :

>I've applied [1] which seems pretty good, but unfortunately, it seems like qemu doesn't like working with ppc64le binaries and fails with an invalid exception [2]
>
> This might be worth a bug, Mikhail?

Hmm, maybe. Did you run diskimage-builder inside a ppc64 VM? Because I believe chroot environment does not allow you to easily build an image for a different arch without some extra tinkering. I was not able to do a cross-platform build of ppc from x86 chroot environment. So using the target arch VM is what I was doing.

Also it appears that current diskimage-builder test suite does not cover the partition code. In it's absense I have created https://review.openstack.org/#q,I2b62d6f9888237488f5bcc9cdf2aa86dc40eba95,n,z on top of proposed fix. ppc64 test currently fails with

    exception [Unknown flag [prep] in partitioning for [boot]]

Changed in diskimage-builder:
assignee: Ian Wienand (iwienand) → Mikhail S Medvedev (msmedved)
Changed in diskimage-builder:
assignee: Mikhail S Medvedev (msmedved) → Ian Wienand (iwienand)
Revision history for this message
Ian Wienand (iwienand) wrote :

Ok, so what seems to be at the root of this is that "grub-install" for ieee-1275 is trying to find a companion prep partition (type 0x41) to the root partition which it will copy the openfirmware binaries into

So after adding the extra prep partition for PPC in https://review.openstack.org/447739 the bootloader does correctly make this on loop0p1 and we have the root device at /dev/loop0p2. however, when we "grub-install /dev/loop0p2" here, we get the weird error about

 /usr/sbin/grub-install: error: the chosen partition is not a PReP partition

I *think* this is all tied up around [1] where it does a bunch of path matching. However, I think that when finding device-mapper, grub takes a different probe path. This probably explains why in [2] the ppc hard-coded itself to run kpartx and then overrides the devices to the /dev/mapper equivalents (note the original change gives no clues in the changelog about why it does this, and no comments are left at all).

So by running kpartx during partitioning to make /dev/mapper/loop* entries, and then hard-coding the PPC build to use /dev/mapper/loop0p2, "grub-install" was happy and Mikhail tested and made a bootable image.

This leaves the question of what to do ...

@Andreas -- in [3] why do we have the kpartx call conditionally after the udev/partprobe call? Would we be better off using /dev/mapper/* throughout the code, instead of /dev/loop*?

A hack that I think I will try might be to symlink /dev/mapper/loop* entries back to /dev/loop* entries just in the PPC build around the call to "grub-install". I'm hoping this might fool it into finding the right partitions using it's /dev/mapper support. I think this would be a rather massive hack, and probably limit the partitioning magic that can happen on PPC ... but it might be enough for now to at least get us back to where we were.

[1] https://github.com/coreos/grub/blob/grub-2.02-beta2/grub-core/osdep/linux/ofpath.c#L517
[2] https://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/vm/block-device.d/10-partition?h=1.28.0#n45
[3] https://git.openstack.org/cgit/openstack/diskimage-builder/tree/diskimage_builder/block_device/level1/partitioning.py#n197

Changed in diskimage-builder:
assignee: Ian Wienand (iwienand) → Andreas Florath (ansreas)
Changed in diskimage-builder:
assignee: Andreas Florath (ansreas) → Ian Wienand (iwienand)
Changed in diskimage-builder:
assignee: Ian Wienand (iwienand) → Mikhail S Medvedev (msmedved)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on diskimage-builder (master)

Change abandoned by Ian Wienand (<email address hidden>) on branch: master
Review: https://review.openstack.org/447739
Reason: See I0918e8df8797d6dbabf7af618989ab7f79ee9580

Revision history for this message
Mikhail S Medvedev (msmedved) wrote :
Changed in diskimage-builder:
status: In Progress → Fix Committed
assignee: Mikhail S Medvedev (msmedved) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.