Run emulated riscv64 VMs on amd64

Bug #2023211 reported by Colin Watson
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Felipe Reyes
OpenStack Nova Compute Charm
Invalid
Undecided
Unassigned
nova (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

[Impact]

Since the OpenStack Yoga release is possible to run emulated architectures ( https://docs.openstack.org/nova/latest/admin/hw-emulation-architecture.html ), although riscv64 is not in the list of supported architectures.

In the Launchpad build farm, we run a cluster of riscv64 virtual machines that deal with building riscv64 artifacts, including .debs for Ubuntu itself. We don't currently have hypervisor-capable riscv64 hardware to run these on, so we're using qemu system emulation on commodity amd64 hardware. This works OK, but we currently do this in manually-configured libvirt instances; we'd much rather be able to do it on our internal OpenStack clouds.

[Test Case]

  $ wget http://cloud-images.ubuntu.com/server/releases/jammy/release-20220420/ubuntu-22.04-server-cloudimg-riscv64.img
  $ openstack image create --disk-format qcow2 --file ~cjwatson/ubuntu-22.04-server-cloudimg-riscv64.img --private --property architecture=riscv64 --property item_name=disk1.img --property os_distro=ubuntu --property os_version=22.04 cjwatson-riscv64-test
  $ openstack image set --property hw_emulation_architecture=riscv64 cjwatson-riscv64-test
  $ openstack image set --property hw_machine_type=virt cjwatson-riscv64-test
  $ openstack server create --image cjwatson-riscv64-test --flavor vbuilder --network vbuilder_staging_test_net cjwatson-riscv64-test

Expected result: the created instance reaches to ACTIVE state.

Actual result: the "openstack server create" command fails with the following error message:
  Invalid image metadata. Error: Architecture name 'riscv64' is not valid (HTTP 400) (Request-ID: req-023932ea-7c90-4be3-89b8-6bd19718919a)

This causes me to think that, even if I've left out some property or other (e.g. firmware), basic things like the riscv64 architecture name being valid aren't currently in place. But it's certainly possible I've got something wrong here. If there's a known way to make this work, could it please be documented?

I've attached `virsh dumpxml` output from one of the manual libvirt instances we use at present, in case it's useful.

Revision history for this message
Colin Watson (cjwatson) wrote :
Revision history for this message
Felipe Reyes (freyes) wrote :

We (OpenStack Engineering) need to identify if there are gaps in the support to run riscv64 emulation. The upstream documentation can be found at https://docs.openstack.org/nova/latest/admin/hw-emulation-architecture.html , the support landed in Yoga. If no issues are found in the charm, we should asses if we can improve the charm-guide to assist operators on how to achieve this setup.

Revision history for this message
Felipe Reyes (freyes) wrote :

The issue here seems to be that the nova.objects.fields.Architecture class[0] doesn't have registered RISCV64, so the list defined in os-traits[1] doesn't seem to be enough to have support for an emulated architecture.

Here I'm attaching a diff that can be used as a starting point.

Reproducer:

```
test -f ubuntu-22.04-server-cloudimg-riscv64.img && wget http://cloud-images.ubuntu.com/server/releases/jammy/release-20220420/ubuntu-22.04-server-cloudimg-riscv64.img
openstack image create \
    --disk-format qcow2 \
    --file ./ubuntu-22.04-server-cloudimg-riscv64.img \
    --public \
    --property architecture=riscv64 \
    --property item_name=disk1.img \
    --property os_distro=ubuntu \
    --property os_version=22.04 \
    riscv64-test
openstack image set --property hw_emulation_architecture=riscv64 riscv64-test
openstack image set --property hw_machine_type=virt riscv64-test
openstack server create \
    --image riscv64-test \
    --network freyes_admin_net \
    --flavor m1.medium \
    --key-name freyes \
    my-riscv64-test
```

Output:

```
Invalid image metadata. Error: Architecture name 'riscv64' is not valid (HTTP 400) (Request-ID: req-7a2d48d1-4457-40fc-b26a-7906ce929532)
```

In the logs:

```
2023-06-24 03:32:43.912 2238785 INFO nova.api.openstack.wsgi [req-7a2d48d1-4457-40fc-b26a-7906ce929532 a3ee17ed103a4cbbabdb215d9dca0482 - - 1bf127c9b631435984600ac72fa5374f 1bf127c9b631435984600ac72fa5374f] HTTP exception thrown: Invalid image metadata. Error: Architecture name 'riscv64' is not valid
2023-06-24 03:32:43.913 2238785 DEBUG nova.api.openstack.wsgi [req-7a2d48d1-4457-40fc-b26a-7906ce929532 a3ee17ed103a4cbbabdb215d9dca0482 - - 1bf127c9b631435984600ac72fa5374f 1bf127c9b631435984600ac72fa5374f] Returning 400 to user: Invalid image metadata. Error: Architecture name 'riscv64' is not valid __call__ /usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py:936
```

[0] https://opendev.org/openstack/nova/src/branch/master/nova/objects/fields.py#L120
[1] https://opendev.org/openstack/os-traits/src/branch/master/os_traits/compute/arch.py

Revision history for this message
Felipe Reyes (freyes) wrote :
Changed in charm-nova-compute:
status: New → Triaged
Revision history for this message
Felipe Reyes (freyes) wrote :

When using the patch available in comment #4 in a devstack environment the validation passes allowing the creation of a riscv64 instance, although the XML produced by Nova is rejected by libvirt with the following error:

ERROR nova.virt.libvirt.guest [None req-3910b77d-daf6-43fd-970d-a0989f3d72d1 demo admin] Error launching a defined domain with XML: [...]
Jul 20 14:45:20 green nova-compute[144096]: : libvirt.libvirtError: this function is not supported by the connection driver: 'riscv64' architecture is not supported by CPU driver

The attachment contains the generated XML by Nova.

Revision history for this message
Colin Watson (cjwatson) wrote :

Does your libvirt version have https://gitlab.com/libvirt/libvirt/-/commit/fd70335876 in it, i.e. 9.1.0-rc1 or better?

(Oddly, we get by with a much older libvirt right now; but I wonder if there were some bad versions in between.)

Revision history for this message
Colin Watson (cjwatson) wrote :

Also, I could be wrong, but I don't know whether that's going to get very far without passing firmware arguments to qemu. See <qemu:commandline/> in the XML I attached.

Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 2023211] Re: Run emulated riscv64 VMs on amd64

I have a newer patch that allowed me to create an instance, although it seems to not successfully
boot, I'm investigating that issue now.

$ openstack server list

+--------------------------------------+-----------------+--------+---------------------------------
-----------------------+--------------+----------+
| ID | Name | Status | Networks
| Image | Flavor |
+--------------------------------------+-----------------+--------+---------------------------------
-----------------------+--------------+----------+
| 51d815d0-a0c2-4a9e-93ab-26aed2df746f | my-riscv64-test | ACTIVE | private=10.0.0.13,
fd60:1406:b51:0:f816:3eff:fee7:b4b7 | riscv64-test | m1.small |
+--------------------------------------+-----------------+--------+---------------------------------
-----------------------+--------------+----------+
$ openstack console log show my-riscv64-test

OpenSBI v0.9
   ____ _____ ____ _____
  / __ \ / ____| _ \_ _|
 | | | |_ __ ___ _ __ | (___ | |_) || |
 | | | | '_ \ / _ \ '_ \ \___ \| _ < | |
 | |__| | |_) | __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name : riscv-virtio,qemu
Platform Features : timer,mfdeleg
Platform HART Count : 1
Firmware Base : 0x80000000
Firmware Size : 100 KB
Runtime SBI Version : 0.2

Domain0 Name : root
Domain0 Boot HART : 0
Domain0 HARTs : 0*
Domain0 Region00 : 0x0000000080000000-0x000000008001ffff ()
Domain0 Region01 : 0x0000000000000000-0xffffffffffffffff (R,W,X)
Domain0 Next Address : 0x0000000000000000
Domain0 Next Arg1 : 0x00000000bf000000
Domain0 Next Mode : S-mode
Domain0 SysReset : yes

Boot HART ID : 0
Boot HART Domain : root
Boot HART ISA : rv64imafdcsu
Boot HART Features : scounteren,mcounteren,time
Boot HART PMP Count : 16
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 54
Boot HART MHPM Count : 0
Boot HART MHPM Count : 0
Boot HART MIDELEG : 0x0000000000000222
Boot HART MEDELEG : 0x000000000000b109
$ sudo virsh list --all
 Id Name State
-----------------------------------
 1 instance-00000002 running

$ sudo virsh dumpxml instance-00000002 | pastebinit
https://paste.ubuntu.com/p/Bn2YBHP2HJ/

Revision history for this message
Felipe Reyes (freyes) wrote :

On Thu, 2023-07-20 at 15:28 +0000, Colin Watson wrote:
> Does your libvirt version have
> https://gitlab.com/libvirt/libvirt/-/commit/fd70335876 in it, i.e.
> 9.1.0-rc1 or better?
>

I made it work making Nova to *not* set the <cpu></cpu> node in the generated XML, this made libvirt
happy. This is what Nova does for MIPSEL which suffers from the same limitation on the libvirt side.

Revision history for this message
Felipe Reyes (freyes) wrote :

Here it's the console log for an instance that booted successfully. The missing part of the puzzle was to define a kernel image, specifically the image that comes with the u-boot-qemu package.

$ openstack image create qemu-riscv64-uboot --public --disk-format aki --container-format aki --file /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf
$ openstack image set --property kernel_id=$(openstack image show -c id -f value qemu-riscv64-uboot) riscv64-test
$ openstack server create ...
[...]
$ openstack console log show my-riscv64-test | grep cloud-init
[ 266.351641] cloud-init[533]: Cloud-init v. 22.1-14-g2e17a0d6-0ubuntu1~22.04.5 running 'init' at Thu, 20 Jul 2023 20:11:05 +0000. Up 261.24 seconds.
[ 268.797660] cloud-init[533]: ci-info: +++++++++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++++++++
[ 268.828670] cloud-init[533]: ci-info: +--------+------+----------------------------------------+-----------------+--------+-------------------+
[ 268.845621] cloud-init[533]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 268.862611] cloud-init[533]: ci-info: +--------+------+----------------------------------------+-----------------+--------+-------------------+
[ 268.879394] cloud-init[533]: ci-info: | enp1s0 | True | 10.0.0.15 | 255.255.255.192 | global | fa:16:3e:bb:91:35 |
[ 268.909649] cloud-init[533]: ci-info: | enp1s0 | True | fd60:1406:b51:0:f816:3eff:febb:9135/64 | . | global | fa:16:3e:bb:91:35 |
[ 268.926517] cloud-init[533]: ci-info: | enp1s0 | True | fe80::f816:3eff:febb:9135/64 | . | link | fa:16:3e:bb:91:35 |
[ 268.944076] cloud-init[533]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
[ 268.973753] cloud-init[533]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 268.979713] cloud-init[533]: ci-info: +--------+------+----------------------------------------+-----------------+--------+-------------------+

Felipe Reyes (freyes)
description: updated
Changed in charm-nova-compute:
status: Triaged → Invalid
Changed in nova:
assignee: nobody → Felipe Reyes (freyes)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Changed in nova:
status: New → In Progress
Revision history for this message
Junien Fridrick (axino) wrote :

Update : we installed a patched version of nova packages provided by @freyes (via a PPA) on PS6, and we're able to successfully create RISC-V instances after some image manipulation.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.