[1.9] grub-install error on power8 deployment

Bug #1523779 reported by Newell Jensen
56
This bug affects 9 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Blake Rouse
1.9
Fix Released
Critical
Blake Rouse
curtin
Fix Released
High
Unassigned

Bug Description

On a brand new 1.9rc3 install I am running into the following error while deploying power8 lpars:

[ 515.215462] cloud-init[2019]: Installing for powerpc-ieee1275 platform.
[ 519.989131] cloud-init[2019]: File descriptor 3 (pipe:[82785]) leaked on vgs invocation. Parent PID 38035: grub-install
[ 519.990023] cloud-init[2019]: File descriptor 5 (/dev/sde1) leaked on vgs invocation. Parent PID 38035: grub-install
[ 520.050768] cloud-init[2019]: Found duplicate PV yrEfpMbescROdKiPs5SBZXCwFIjtPMcN: using /dev/sde2 not /dev/sda2
[ 520.075819] cloud-init[2019]: File descriptor 3 (pipe:[82785]) leaked on vgs invocation. Parent PID 38035: grub-install
[ 520.076906] cloud-init[2019]: File descriptor 5 (/dev/sde1) leaked on vgs invocation. Parent PID 38035: grub-install
[ 520.083843] cloud-init[2019]: Found duplicate PV yrEfpMbescROdKiPs5SBZXCwFIjtPMcN: using /dev/sde2 not /dev/sda2
[ 520.210166] cloud-init[2019]: grub-install: error: Can't create file: No such file or directory.
[ 520.210863] cloud-init[2019]: Failed: grub-install --target=powerpc-ieee1275
[ 520.211620] cloud-init[2019]: WARNING: Bootloader is not properly installed, system may not be bootable

Here is the entire deployment console output with the above messages starting at line 1526

http://paste.ubuntu.com/13814238/

No error messages in /var/log/maas/*.log

ubuntu@landmaas:~$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================================================-===================================================-============-===============================================================================
ii maas 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS server all-in-one metapackage
ii maas-cli 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS command line API tool
ii maas-cluster-controller 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS server cluster controller
ii maas-common 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS server common files
ii maas-dhcp 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS DHCP server
ii maas-dns 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS DNS server
ii maas-proxy 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS Caching Proxy
ii maas-region-controller 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS server complete region controller
ii maas-region-controller-min 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS Server minimum region controller
ii python-django-maas 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS server Django web framework
ii python-maas-client 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS python API client
ii python-maas-provisioningserver 1.9.0~rc3+bzr4525-0ubuntu1~trusty1 all MAAS server provisioning libraries

The above pastebin was with the LVM layout default for MAAS 1.9. I also tried commissioining and deploying with a flat filesystem. I still get a grub-install error from curtin:

http://paste.ubuntu.com/14025172/

Related bugs:
 * bug 1526542: maas sends duplicate device info in config / need to be multipath aware
 * bug 1543263: doesn't support partitions with partition_number where a previous partition of that number doesn't exist

Tags: oil

Related branches

Changed in maas:
milestone: none → 1.9.0
Changed in maas:
milestone: 1.9.0 → none
description: updated
Jeff Lane  (bladernr)
tags: added: blocks-hwcert-server
description: updated
description: updated
Revision history for this message
Newell Jensen (newell-jensen) wrote :

On the installed system I run:

ubuntu@obese-pleasure:~$ sudo grub-install /dev/sda
sudo: unable to resolve host obese-pleasure
Installing for powerpc-ieee1275 platform.
grub-install: error: failed to get canonical path of `overlayroot'.
ubuntu@obese-pleasure:~$ sudo update-grub
sudo: unable to resolve host obese-pleasure
/usr/sbin/grub-probe: error: failed to get canonical path of `overlayroot'.

This is mentioned in the cloud-init bug:

https://bugs.launchpad.net/cloud-init/+bug/1276648

Revision history for this message
Scott Moser (smoser) wrote :

I can reproduce this now, which is good.
I filed bug 1526542 to cover the first issue I hit. I'm not sure how newell didn't hit it.

I'll debug further tomorrow.

Changed in curtin:
status: New → Confirmed
importance: Undecided → High
description: updated
Revision history for this message
Scott Moser (smoser) wrote :

OK. So the summary of what is wrong here, is that MAAS provided invalid storage data.
The storage config sent is below (I did trim out some duplicate and unused devices).
MAAS said to install grub onto /dev/sda. On power systems the grub that is installed is grub-ieee1275. grub-ieee1275 requires a 8M PrEP partition. Since maas did not provide one, the installation of grub failed.

On powerNV, the bootloader that is actually used is petitboot, so grub is not even strictly necessary. I'm not sure if petitboot supports reading grub config from lvm volumes or not. That said, guests in KVM on power systems or powerVM systems *do* need grub. previously, 'curtin block-mode simple' would configure all power systems by creating a 8M PrEP partition and installing grub to that.

I've got a storage config to successfully install, I'll attach here the one that maas sent and the modified version that worked.

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

the suggested config has thefollowing changes
## The things that are changed here are basically:
## - add 'sda-part1' as prep partition at beginning of disk.
## - rename sda-part1 to sda-part2
## - move 'grub_device' from 'sda' to 'sda-part1'
## - modify sizes of vgroot-lvroot, sda-part2 to take away 8*1024*1024 bytes

It doesnt completely install at the moment, I think there are a few things still to fix in curtin.

Note, the 'block-meta simple' still does work fine for installing.

Changed in maas:
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

there does seem to be a bug in curtin here possibly exposed by the multipath that is present on my system, and also one in invoking of install-grub. I'll look more into those tomorrow.

Revision history for this message
Scott Moser (smoser) wrote :

one other change that should be made, maas probably should create a gpt partition table rather than msdos.

Revision history for this message
Scott Moser (smoser) wrote :

here is my working config:
partitioning_commands:
  builtin: [curtin, block-meta, custom]
storage:
  config:
  - {id: sda, model: IPR-0 5EC29C00, name: sda, ptable: gpt, serial: IBM_IPR-0_5EC29C0000000080, type: disk, wipe: superblock}
  - {device: sda, grub_device: true, id: sda-part1, name: sda-part1, number: 1, flag: prep, offset: 4194304B, size: 8388608B, type: partition, wipe: zero}
  - {device: sda, id: sda-part2, name: sda-part2, number: 2, offset: 12582912B, size: 283778220032B, type: partition, uuid: 6193de67-6b76-4cb6-bf6d-25518842d54b, wipe: superblock}
  - devices: [sda-part2]
    id: vgroot
    name: vgroot
    type: lvm_volgroup
    uuid: 93d1991a-9631-4051-81dd-c60d3267f23b
  - {id: vgroot-lvroot, name: lvroot, size: 283774025728B, type: lvm_partition, volgroup: vgroot}
  - {fstype: ext4, id: vgroot-lvroot_format, label: root, type: format, uuid: efdcbb0b-b9ca-455f-8cf9-1dbfc820717a, volume: vgroot-lvroot}
  - {device: vgroot-lvroot_format, id: vgroot-lvroot_mount, path: /, type: mount}
  version: 1

Revision history for this message
Scott Moser (smoser) wrote :

OK,
At this point I'm about to go on break, so I'm not confidident pulling the code into trunk rigth now, but I believe the branch at
lp:~smoser/curtin/trunk.lp1523779 represents fixed curtin for this issue.

There are still basically 2 MAAS issues that will block this though:
 a.) maas will need to not send entries for both backing disks of a multipath disk. (bug 1526542).
 b.) maas will need to learn that power systems need a PrEP partition and also that they should use GPT.

Curtin does still have a bug if using MBR and storage config, that it will not understand or act 'flag: prep' in msdos mode.

any powerNV system likely needs both 'a' and 'b' (as I believe they have multipath almost by default).
a KVM guest (powerKVM or kvm on ubuntu) or a powerVM is not likely to see mutipath, so 'b' is probably only necessary.

I believe that powerVM is what Newell was testing with.

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

The branch lp:~wesley-wiedenmeier/curtin/trunk.lp1523779 builds on the fix for curtin in lp:~smoser/curtin/trunk.lp1523779 and adds support for setting the PReP flag when in msdos mode.

Changed in maas:
milestone: none → next
status: Confirmed → Triaged
importance: Undecided → High
importance: High → Critical
Revision history for this message
Scott Moser (smoser) wrote :

fix is committed in curtin at revno 332. We can/should gain prep support for msdos also, but maas really should partition power systems as gpt.

Changed in curtin:
status: Confirmed → Fix Committed
Larry Michel (lmic)
tags: added: oil
summary: - [1.9rc3] grub-install error on power8 deployment
+ [1.9] grub-install error on power8 deployment
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: Triaged → In Progress
Scott Moser (smoser)
description: updated
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
rory schramm (roryschramm) wrote :

That seems to be working!!!!
I was able to deploy power to bare metal using the mas gui ( havn't tried juju yet).

However, I'm seeing some python errors in the clusterd log on the maas server. Not sure if it's related. Maas clusterd.log is attached.

Revision history for this message
rory schramm (roryschramm) wrote :

juju is failing to orchestrate due to missing 1.25.4 packeges that fix https://bugs.launchpad.net/juju-core/1.25/+bug/1532167. bug page shows that a fix was released. However, its not in the stable or devel ppa.

Jeff Lane  (bladernr)
tags: removed: blocks-hwcert-server
Changed in curtin:
status: Fix Committed → Fix Released
Changed in maas:
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.