[2.6] Unable to reboot s390x KVM machine after initial deploy

Bug #1859656 reported by Sean Feole on 2020-01-14
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Lee Trager
QEMU
Undecided
Unassigned
Ubuntu on IBM z Systems
High
MAAS

Bug Description

MAAS version: 2.6.1 (7832-g17912cdc9-0ubuntu1~18.04.1)
Arch: S390x

Appears that MAAS can not find the s390x bootloader to boot from the disk, not sure how maas determines this. However this was working in the past. I had originally thought that if the maas machine was deployed then it defaulted to boot from disk.

If I force the VM to book from disk, the VM starts up as expected.

Reproduce:

- Deploy Disco on S390x KVM instance
- Reboot it

on the KVM console...

Connected to domain s2lp6g001
Escape character is ^]
done
  Using IPv4 address: 10.246.75.160
  Using TFTP server: 10.246.72.3
  Bootfile name: 'boots390x.bin'
  Receiving data: 0 KBytes
  TFTP error: file not found: boots390x.bin
Trying pxelinux.cfg files...
  Receiving data: 0 KBytes
  Receiving data: 0 KBytes
Failed to load OS from network

==> /var/log/maas/rackd.log <==
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] boots390x.bin requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/65a9ca43-9541-49be-b315-e2ca85936ea2 requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/01-52-54-00-e5-d7-bb requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0AF64BA0 requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0AF64BA requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0AF64B requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0AF64 requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0AF6 requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0AF requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0A requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/0 requested by 10.246.75.160
2020-01-14 18:21:24 provisioningserver.rackdservices.tftp: [info] s390x/default requested by 10.246.75.160

Sean Feole (sfeole) wrote :
summary: - [2.6] Unable to reboot s390x machine after initial deploy
+ [2.6] Unable to reboot s390x KVM machine after initial deploy
Sean Feole (sfeole) wrote :

Powering off the machine after its initial deployment renders the machine unusable.

Workaround:
Release & Redeploy again.

Sean Feole (sfeole) on 2020-01-14
description: updated
Frank Heimes (fheimes) on 2020-01-16
Changed in ubuntu-z-systems:
status: New → Triaged
importance: Undecided → High
assignee: nobody → MAAS (maas)
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
Sean Feole (sfeole) wrote :

Please note that I have updated MAAS to version 2.6.2 from the proposed PPA and this problem still exists.

Lee Trager (ltrager) wrote :

boots390x.bin is a place holder, the file shouldn't exist. Its defined as a way to allow MAAS to know the architecture of the machine being booted. The bootloader itself is shipped with kvm. You do need a newer version of kvm. Bionic should work I can't remember if it was backported to Xenial.

Changed in maas:
status: New → Incomplete
Lee Trager (ltrager) wrote :

I forgot to add each Pod/kvm host has to have Bionic or newer. This blog post explains it pretty well

http://ubuntu-on-big-iron.blogspot.com/2019/08/maas-kvm-on-s390x-part1.html

Sean Feole (sfeole) wrote :

Hey Lee,
I took a look at that document. I want to make a few points here. This has worked in the past, earlier versions of MAAS. Nothing has ever changed on my lpar that is hosting the VM's.
The host lpar is Bionic, 18.04. This was working for months and suddenly stopped, the only change in my labs have been me updating the maas servers to newer versions.

According to libvirt documentations the s390x arch only respects the first <OS> <boot> param in the XML.
  <os>
    <type arch='s390x' machine='s390-ccw-virtio-bionic'>hvm</type>
    <boot dev='network'/>
    <boot dev='hd'/>
  </os>

In normal circumstances, net boot fails and the VM default to the HD, but on s390 that's not the case. Once net boot fails then

Connected to domain s2lp6g001
Escape character is ^]
done
  Using IPv4 address: 10.246.75.177
  Using TFTP server: 10.246.72.3
  Bootfile name: 'boots390x.bin'
  Receiving data: 0 KBytes
  TFTP error: file not found: boots390x.bin
Trying pxelinux.cfg files...
  Receiving data: 0 KBytes
  Receiving data: 0 KBytes
Failed to load OS from network

your stuck there in the off state forever. Changing the XML so that the VM boots from <HD> first works. however that's not really acceptable in this use case.

I would imagine that for this to work MAAS-dhcp would have to instruct the s390x VM to boot from "local" (disk) once it's already deployed. All by means of the /var/lib/maas/dhcpd.conf

Is that not how this is designed to work?

Lee Trager (ltrager) wrote :

You are correct that the XML shouldn't have to be changed to work with MAAS. All architectures MAAS currently supports always boot from the network. MAAS gives the network boot loader a config file which tells it if it should boot into an ephemeral environment over the network or local boot.

From the log you posted MAAS is doing everything it should do. MAAS specifies boots390x.bin as a place holder so we know what architecture is booting[1]. When the kvm provided bootloader runs MAAS returns an empty config file because that should instruct the kvm bootloader to boot from disk[2].

I added the qemu-kvm project as the only thing I can think of is qemu-kvm has updated its bootloader which broke MAAS. Do you know which version of qemu-kvm previously worked?

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/boot/s390x.py#n76
[2] https://git.launchpad.net/maas/tree/src/provisioningserver/boot/s390x.py#n129

affects: qemu-kvm → qemu

qemu-kvm doesn't exist for years, I have marked it for qemu instead.
Thanks Frank for making me aware.

Sean got everything right in comment #6, it can only boot one and that is the first boot entry.
There is no fallback/fallthrough on s390x.

If you stick with global boot options the host would needs to change the XML to boot fro disk in this case. (BTW that is the case since the beginnign the comment is from libvirt 3.5 somewhere around zesty I think).

P.S. if this ever worked it was a bug that is not to be relied upon (but I'd wonder)

But that doesn't mean it won't work, just not with that XML format.
I've never tested it but I think you might be able to get away with a proper bootorder config.
An example can be found here [1] that you might try (do not implement it directly, give it a test please)

[1]: https://libvirt.org/git/?p=libvirt.git;a=blob;f=tests/qemuxml2xmloutdata/machine-loadparm-multiple-disks-nets-s390.xml;h=c4e08fd4401bf5bf448ee45ab8890b3e44057f97;hb=HEAD

Changed in qemu:
status: New → Incomplete
Sean Feole (sfeole) wrote :

I tried the above workaround mentioned by Christian last week also, I did not mention that in comment #6.

I tried using the boot order configuration as outlined in the example(comment #9)

After the machine deploys, the same symptom occurs. So we are sort of stuck again still.

Domain s2lp6g001 started

Connected to domain s2lp6g001
Escape character is ^]
done
  Using IPv4 address: 10.246.75.152
  Using TFTP server: 10.246.72.3
  Bootfile name: 'boots390x.bin'
  Receiving data: 0 KBytes
  TFTP error: file not found: boots390x.bin
Trying pxelinux.cfg files...
  Receiving data: 0 KBytes
  Receiving data: 0 KBytes
Failed to load OS from network

Changed in maas:
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments