ppc64le: KVM guest fails to boot with an error `virtio_scsi: probe of virtio1 failed with error -22` on master

Bug #1847440 reported by Satheesh Rajendran
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

PowerPC KVM Guest fails to boot on current qemu master, bad commit: e68cd0cb5cf49d334abe17231a1d2c28b846afa2

Env:
HW: IBM Power9
Host Kernel: 5.4.0-rc2-00038-ge3280b54afed
Guest Kernel: 4.13.9-300.fc27.ppc64le
Qemu: https://github.com/qemu/qemu.git (master)
Libvirt: 5.4.0

Guest boot gets stuck:
...
[ OK ] Mounted Kernel Configuration File System.
[ 7.598740] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
[ 7.598828] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver
[ 7.598957] virtio-pci 0000:00:02.0: enabling device (0000 -> 0003)
[ 7.599017] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver
[ 7.599123] virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
[ 7.599182] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[ 7.620620] synth uevent: /devices/vio: failed to send uevent
[ 7.620624] vio vio: uevent: failed to send synthetic uevent
[ OK ] Started udev Coldplug all Devices.
[ 7.624559] audit: type=1130 audit(1570610300.990:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ OK ] Reached target System Initialization.
[ OK ] Reached target Basic System.
[ OK ] Reached target Remote File Systems (Pre).
[ OK ] Reached target Remote File Systems.
[ 7.642961] virtio_scsi: probe of virtio1 failed with error -22
[ *** ] A start job is running for dev-disk…21b3519a80.device (14s / no limit)
...

git bisect, yielded a bad commit [e68cd0cb5cf49d334abe17231a1d2c28b846afa2] spapr: Render full FDT on ibm,client-architecture-support, reverting this commit boot the guest properly.

git bisect start
# good: [9e06029aea3b2eca1d5261352e695edc1e7d7b8b] Update version for v4.1.0 release
git bisect good 9e06029aea3b2eca1d5261352e695edc1e7d7b8b
# bad: [98b2e3c9ab3abfe476a2b02f8f51813edb90e72d] Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
git bisect bad 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d
# good: [56e6250ede81b4e4b4ddb623874d6c3cdad4a96d] target/arm: Convert T16, nop hints
git bisect good 56e6250ede81b4e4b4ddb623874d6c3cdad4a96d
# good: [5d69cbdfdd5cd6dadc9f0c986899844a0e4de703] tests/tcg: target/s390x: Test MVC
git bisect good 5d69cbdfdd5cd6dadc9f0c986899844a0e4de703
# good: [88112488cf228df8b7588c8aa38e16ecd0dff48e] qapi: Make check_type()'s array case a bit more obvious
git bisect good 88112488cf228df8b7588c8aa38e16ecd0dff48e
# good: [972bd57689f1e11311d86b290134ea2ed9c7c11e] ppc/kvm: Skip writing DPDES back when in run time state
git bisect good 972bd57689f1e11311d86b290134ea2ed9c7c11e
# bad: [1aba8716c8335e88b8c358002a6e1ac89f7dd258] ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
git bisect bad 1aba8716c8335e88b8c358002a6e1ac89f7dd258
# bad: [00ed3da9b5c2e66e796a172df3e19545462b9c90] xics: Minor fixes for XICSFabric interface
git bisect bad 00ed3da9b5c2e66e796a172df3e19545462b9c90
# good: [33432d7737b53c92791f90ece5dbe3b7bb1c79f5] target/ppc: introduce set_dfp{64,128}() helper functions
git bisect good 33432d7737b53c92791f90ece5dbe3b7bb1c79f5
# good: [f6d4c423a222f02bfa84a49c3d306d7341ec9bab] target/ppc: remove unnecessary if() around calls to set_dfp{64,128}() in DFP macros
git bisect good f6d4c423a222f02bfa84a49c3d306d7341ec9bab
# bad: [e68cd0cb5cf49d334abe17231a1d2c28b846afa2] spapr: Render full FDT on ibm,client-architecture-support
git bisect bad e68cd0cb5cf49d334abe17231a1d2c28b846afa2
# good: [c4ec08ab70bab90685d1443d6da47293e3aa312a] spapr-pci: Stop providing assigned-addresses
git bisect good c4ec08ab70bab90685d1443d6da47293e3aa312a
# first bad commit: [e68cd0cb5cf49d334abe17231a1d2c28b846afa2] spapr: Render full FDT on ibm,client-architecture-support

attached vmxml.

qemu commandline:
/home/sath/qemu/ppc64-softmmu/qemu-system-ppc64 -name guest=vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-vm1/master-key.aes -machine pseries-4.2,accel=kvm,usb=off,dump-guest-core=off -m 81920 -overcommit mem-lock=off -smp 512,sockets=1,cores=128,threads=4 -uuid fd4a5d54-0216-490e-82d2-1d4e89683b3d -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=24,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/home/sath/tests/data/avocado-vt/images/jeos-27-ppc64le_vm1.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e6:df:24,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -M pseries,ic-mode=xics -msg timestamp=on

Tags: kvm powerpcm qemu
Revision history for this message
Satheesh Rajendran (sathnaga) wrote :
summary: ppc64le: KVM guest fails to boot with an error `virtio_scsi: probe of
- virtio1 failed with error -22` on
- master(98b2e3c9ab3abfe476a2b02f8f51813edb90e72d)
+ virtio1 failed with error -22` on master
description: updated
description: updated
Revision history for this message
David Gibson (dwg) wrote : Re: [Bug 1847440] [NEW] ppc64le: KVM guest fails to boot with an error `virtio_scsi: probe of virtio1 failed with error -22` on master
Download full text (5.6 KiB)

On Thu, Oct 10, 2019 at 07:16:49AM -0000, Launchpad Bug Tracker wrote:
> You have been subscribed to a public bug by Satheesh Rajendran (sathnaga):
>
> PowerPC KVM Guest fails to boot on current qemu master, bad commit:
> e68cd0cb5cf49d334abe17231a1d2c28b846afa2
>
> Env:
> HW: IBM Power9
> Host Kernel: 5.4.0-rc2-00038-ge3280b54afed
> Guest Kernel: 4.13.9-300.fc27.ppc64le
> Qemu: https://github.com/qemu/qemu.git (master)
> Libvirt: 5.4.0
>
> Guest boot gets stuck:
> ...
> [ OK ] Mounted Kernel Configuration File System.
> [ 7.598740] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
> [ 7.598828] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver
> [ 7.598957] virtio-pci 0000:00:02.0: enabling device (0000 -> 0003)
> [ 7.599017] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver
> [ 7.599123] virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
> [ 7.599182] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
> [ 7.620620] synth uevent: /devices/vio: failed to send uevent
> [ 7.620624] vio vio: uevent: failed to send synthetic uevent
> [ OK ] Started udev Coldplug all Devices.
> [ 7.624559] audit: type=1130 audit(1570610300.990:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [ OK ] Reached target System Initialization.
> [ OK ] Reached target Basic System.
> [ OK ] Reached target Remote File Systems (Pre).
> [ OK ] Reached target Remote File Systems.
> [ 7.642961] virtio_scsi: probe of virtio1 failed with error -22
> [ *** ] A start job is running for dev-disk…21b3519a80.device (14s / no limit)
> ...
>
> git bisect, yielded a bad commit
> [e68cd0cb5cf49d334abe17231a1d2c28b846afa2] spapr: Render full FDT on ibm
> ,client-architecture-support, reverting this commit boot the guest
> properly.
>
> git bisect start
> # good: [9e06029aea3b2eca1d5261352e695edc1e7d7b8b] Update version for v4.1.0 release
> git bisect good 9e06029aea3b2eca1d5261352e695edc1e7d7b8b
> # bad: [98b2e3c9ab3abfe476a2b02f8f51813edb90e72d] Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
> git bisect bad 98b2e3c9ab3abfe476a2b02f8f51813edb90e72d
> # good: [56e6250ede81b4e4b4ddb623874d6c3cdad4a96d] target/arm: Convert T16, nop hints
> git bisect good 56e6250ede81b4e4b4ddb623874d6c3cdad4a96d
> # good: [5d69cbdfdd5cd6dadc9f0c986899844a0e4de703] tests/tcg: target/s390x: Test MVC
> git bisect good 5d69cbdfdd5cd6dadc9f0c986899844a0e4de703
> # good: [88112488cf228df8b7588c8aa38e16ecd0dff48e] qapi: Make check_type()'s array case a bit more obvious
> git bisect good 88112488cf228df8b7588c8aa38e16ecd0dff48e
> # good: [972bd57689f1e11311d86b290134ea2ed9c7c11e] ppc/kvm: Skip writing DPDES back when in run time state
> git bisect good 972bd57689f1e11311d86b290134ea2ed9c7c11e
> # bad: [1aba8716c8335e88b8c358002a6e1ac89f7dd258] ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
> git bisect bad 1aba8716c8335e88b8c358002a6e1ac89f7dd258
> # bad: [00ed3da9b5c2e66e796a172df3e19545462b9c90] xics: Minor fixes for XICSFabric interfa...

Read more...

Revision history for this message
David Gibson (dwg) wrote :

Ok, I just tried booting a guest with virtio-scsi and ic-mode=xics, and I wasn't able to reproduce this problem.

Can you try simplifying your command line to see what options are needed to trigger this?

Revision history for this message
David Gibson (dwg) wrote :

Oh... are you using the SLOF (guest firmware) image included in the qemu tree, or is it coming from a separate package?

If it's from a separate package, that could be the problem - it needs to be updated before that qemu patch is safe.

Revision history for this message
Satheesh Rajendran (sathnaga) wrote :

Did try with the slof bin(-bios /usr/local/share/qemu/slof.bin) complied with qemu tree also, same issue persists,

/home/sath/qemu/ppc64-softmmu/qemu-system-ppc64 \
-name guest=vm1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-vm1/master-key.aes \
-machine pseries-4.2,accel=kvm,usb=off,dump-guest-core=off \
-bios /usr/local/share/qemu/slof.bin \
-m 81920 \
-overcommit mem-lock=off \
-smp 512,sockets=1,cores=128,threads=4 \
-uuid fd4a5d54-0216-490e-82d2-1d4e89683b3d \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=24,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device qemu-xhci,id=usb,bus=pci.0,addr=0x3 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 \
-drive file=/home/sath/tests/data/avocado-vt/images/jeos-27-ppc64le_vm1.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e6:df:24,bus=pci.0,addr=0x1 \
-chardev pty,id=charserial0 \
-device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \
-M pseries,ic-mode=dual \
-msg timestamp=on

Did try with xics aswell, same issue.

Host HW:

#lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 4
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Model: 2.3 (pvr 004e 1203)
Model name: POWER9, altivec supported
CPU max MHz: 3800.0000
CPU min MHz: 2300.0000
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node8 CPU(s): 64-127

FW: skiboot-v6.3.2

Regards,
-Satheesh

Revision history for this message
Alexey Kardashevskiy (aik-ozlabs) wrote :

Please provide the entire guest booting output, from slof till it is stuck.
Also please try with -smp 1. Thanks.

Revision history for this message
Satheesh Rajendran (sathnaga) wrote :
Download full text (17.5 KiB)

Domain vm1 started
Connected to domain vm1
Escape character is ^]
Populating /vdevice methods
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 0800 (D) : 1af4 1000 virtio [ net ]
                     00 1000 (D) : 1af4 1004 virtio [ scsi ]
Populating /pci@800000020000000/scsi@2
       SCSI: Looking for devices
          100000000000000 DISK : "QEMU QEMU HARDDISK 2.5+"
                     00 1800 (D) : 1b36 000d serial bus [ usb-xhci ]
                     00 2000 (D) : 1af4 1002 unknown-legacy-device*
No NVRAM common partition, re-initializing...
Scanning USB
  XHCI: Initializing
Using default console: /vdevice/vty@30000000

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Trying to load: from: /pci@800000020000000/scsi@2/disk@100000000000000 ... Successfully loaded

OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 4.13.9-300.fc27.ppc64le (<email address hidden>) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Mon Oct 23 13:28:27 UTC 2017
Detected machine type: 0000000000000101
command line: BOOT_IMAGE=/boot/vmlinuz-4.13.9-300.fc27.ppc64le root=UUID=500d2159-c568-459e-8864-1c21b3519a80 ro console=tty0 console=ttyS0,115200 console=hvc0
Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
Calling ibm,client-architecture-support...Node not supported
Node not supported
 not implemented
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 00000000046a0000
  alloc_top : 0000000010000000
  alloc_top_hi : 0000001400000000
  rmo_top : 0000000010000000
  ram_top : 0000001400000000
instantiating rtas at 0x000000000daf0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x00000000046b0000 -> 0x00000000046b0b3f
Device tree struct 0x00000000046c0000 -> 0x00000000046d0000
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000002000000 ...
[ 0.000000] Page sizes from device-tree:
[ 0.000000] Page size shift = 12 AP=0x0
[ 0.000000] Page size shift = 16 AP=0x5
[ 0.000000] Page size shift = 21 AP=0x1
[ 0.000000] Page size shift = 30 AP=0x2
[ 0.000000] Using radix MMU under hypervisor
[ 0.000000] Mapped range 0x0 - 0x1400000000 with 0x40000000
[ 0.000000] Process table c0000013ff000000 and radix root for kernel: c0000000014c0000
[ 0.000000] Linux version 4.13.9-300.fc27.ppc64le (<email address hidden>) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Mon Oct 23 13:28:27 UTC 2017
[ 0.000000] Found initrd at 0xc000000003900000:0xc0000000046967f5
[ 0.000000] Using pSeries machine description
[ 0.000000] bootconsole [udbg0] enabled
[ 0.000000] Partition configured for 2 cpus.
[ 0.000000] CPU maps initialized for 1 thread per core
 -> smp_release_cpus()
spi...

Revision history for this message
Satheesh Rajendran (sathnaga) wrote :

Same observation with smp 1 even.

Revision history for this message
Alexey Kardashevskiy (aik-ozlabs) wrote :
Revision history for this message
Thomas Huth (th-huth) wrote :

The SLOF fix has been merged 1.5 years ago, so I assume this can be marked as fixed now.

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.