Ubuntu18.04[P9 DD2.2 Boston]:Unable to boot power8 compat mode guests(ubuntu14.04.5) (kvm)

Bug #1756254 reported by bugproxy on 2018-03-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Joseph Salisbury
Bionic
Critical
Joseph Salisbury

Bug Description

== Comment: #0 - Praveen K. Pandey <email address hidden> - 2018-03-15 03:18:46 ==
Problem Description :

As Ubuntu18.04 as KVM Host installation of ubuntu ubuntu14.04.5 kvm guest is hung to so not able to install using virt-install command

Reproducible Step:
1- Install ubuntu18.04 on Boston DD2.2
2- Configure system as KVM host
3- get ubuntu-14.04.5-server-ppc64el iso
4- start installation using virt-install using ubuntu-14.04.5 iso

LOG:

OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 4.4.0-31-generic (buildd@bos01-ppc64el-007) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #50~14.04.1-Ubuntu SMP Wed Jul 13 01:03:56 UTC 2016 (Ubuntu 4.4.0-31.50~14.04.1-generic 4.4.13)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support...

----- it hung here --------------

root@system :/var/lib/libvirt/images/praveen# uname -a
Linux system4.15.0-12-generic #13-Ubuntu SMP Wed Mar 7 21:37:03 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@lsystem:/var/lib/libvirt/images/praveen#

root@system:/var/lib/libvirt/images/praveen# qemu-img create -f qcow2 pra-ubuntu14.qcow2 30G
Formatting 'pra-ubuntu14.qcow2', fmt=qcow2 size=32212254720 cluster_size=65536 lazy_refcounts=off refcount_bits=16
root@system:/var/lib/libvirt/images/praveen# virt-install --name PRA-1404.5_vm1 --ram 2048 --disk path=/var/lib/libvirt/images/praveen/pra-ubuntu14.qcow2 --vcpus 4 --os-type linux --os-variant generic --network bridge=virbr0 --graphics none --console pty,target_type=serial --cdrom /var/lib/libvirt/images/isos/ubuntu-14.04.5-server-ppc64el.iso
WARNING CDROM media does not print to the text console by default, so you likely will not see text install output. You might want to use --location. See the man page for examples of using --location with CDROM media

Starting install...
Connected to domain PRA-1404.5_vm1
Escape character is ^]
Populating /vdevice methods
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 0800 (D) : 1af4 1000 virtio [ net ]
                     00 1000 (D) : 1af4 1004 virtio [ scsi ]
Populating /pci@800000020000000/scsi@2
       SCSI: Looking for devices
          100000000000000 CD-ROM : "QEMU QEMU CD-ROM 2.5+"
                     00 1800 (D) : 1b36 000d serial bus [ usb-xhci ]
                     00 2000 (D) : 1af4 1001 virtio [ block ]
                     00 2800 (D) : 1af4 1002 unknown-legacy-device*
No NVRAM common partition, re-initializing...
Scanning USB
  XHCI: Initializing
Using default console: /vdevice/vty@30000000

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Trying to load: from: /pci@800000020000000/scsi@2/disk@100000000000000 ... Successfully loaded

                    GNU GRUB version 2.02~beta2-9ubuntu1.11

 |*Install |

      Press enter to boot the selected OS, `e' to edit the commands
      before booting or `c' for a command-line.

OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 4.4.0-31-generic (buildd@bos01-ppc64el-007) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #50~14.04.1-Ubuntu SMP Wed Jul 13 01:03:56 UTC 2016 (Ubuntu 4.4.0-31.50~14.04.1-generic 4.4.13)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support...

Regards
Praveen

== Comment: #1 - Harish Sriram <email address hidden> - 2018-03-15 04:12:37 ==
With P8 compat added to XML guest does not boot up.

  <cpu mode='host-model' check='partial'>
    <model fallback='allow'>power8</model>
    <topology sockets='1' cores='1' threads='4'/>
  </cpu>

# virsh start kal-bionic_vm2 --console
Domain kal-bionic_vm2 started
Connected to domain kal-bionic_vm2
Escape character is ^]

- Harish

== Comment: #5 - Satheesh Rajendran <email address hidden> - 2018-03-15 06:34:29 ==
Looks like we need a SMT=off and /sys/module/kvm_hv/parameters/indep_threads_mode =N workaround still in Power9 DD2.2, on ubuntu18.04 kernel,

# uname -a
Linux ltc-bostonxx 4.15.0-12-generic #13-Ubuntu SMP Wed Mar 7 21:37:03 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Paul,

Pls let us know which patches to be included ?
Thanks in advance.

Regards,
-Satheesh.

== Comment: #6 - Praveen K. Pandey <email address hidden> - 2018-03-15 07:36:25 ==
(In reply to comment #5)
> Looks like we need a SMT=off and
> /sys/module/kvm_hv/parameters/indep_threads_mode =N workaround still in
> Power9 DD2.2, on ubuntu18.04 kernel,
>
> # uname -a
> Linux ltc-bostonxx 4.15.0-12-generic #13-Ubuntu SMP Wed Mar 7 21:37:03 UTC
> 2018 ppc64le ppc64le ppc64le GNU/Linux
>
> Paul,
>
> Pls let us know which patches to be included ?
> Thanks in advance.
>
> Regards,
> -Satheesh.

Hi Satheesh ,

  Seems me still not working with work around

root@ltc-boston122:~# cat /sys/module/kvm_hv/parameters/indep_threads_mode
N
root@ltc-boston122:~# ppc64_cpu --smt
SMT is off
root@ltc-boston122:~#
root@ltc-boston122:~# virsh start PRA-bionic_vm1 --console
Domain PRA-bionic_vm1 started
Connected to domain PRA-bionic_vm1
Escape character is ^]

root@ltc-boston122:~#
root@ltc-boston122:~# uname -a
Linux ltc-boston122 4.15.0-12-generic #13-Ubuntu SMP Wed Mar 7 21:37:03 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-boston122:~#

I am trying to boot already install ubuntu18.04 guest on p8 mode using xml entry .

Regards
Praveen

== Comment: #9 - Harish Sriram <email address hidden> - 2018-03-15 07:51:02 ==
Tired with 16 04/ 18 04 guest as P8 compat. Still facing the same issue.

PNOR is in the latest "version-SUPERMICRO-P9DSU-V1.05-20180308-imp"

- Harish

== Comment: #10 - Satheesh Rajendran <email address hidden> - 2018-03-15 08:15:01 ==
Looks like this kernel patch is missing in ubuntu 18.04,

commit 00608e1f007e4cf6031485c5630e0e504bceef9b
Author: Paul Mackerras <email address hidden>
Date: Thu Jan 11 16:54:26 2018 +1100

    KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2

    POWER9 chip versions starting with "Nimbus" v2.2 can support running
    with some threads of a core in HPT mode and others in radix mode.
    This means that we don't have to prohibit independent-threads mode
    when running a HPT guest on a radix host, and we don't have to do any
    of the synchronization between threads that was introduced in commit
    c01015091a77 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix
    hosts", 2017-10-19).

    Rather than using up another CPU feature bit, we just do an
    explicit test on the PVR (processor version register) at module
    startup time to determine whether we have to take steps to avoid
    having some threads in HPT mode and some in radix mode (so-called
    "mixed mode"). We test for "Nimbus" (indicated by 0 or 1 in the top
    nibble of the lower 16 bits) v2.2 or later, or "Cumulus" (indicated by
    2 or 3 in that nibble) v1.1 or later.

    Signed-off-by: Paul Mackerras <email address hidden>

Regards,
-Satheesh

bugproxy (bugproxy) on 2018-03-16
tags: added: architecture-ppc64le bugnameltc-165741 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-power-systems:
importance: High → Critical
Manoj Iyer (manjo) on 2018-03-19
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
tags: added: triage-g
Changed in linux (Ubuntu Bionic):
status: New → Triaged
status: Triaged → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Joseph Salisbury (jsalisbury)
bugproxy (bugproxy) on 2018-03-19
tags: removed: bugnameltc-165741 severity-critical triage-g
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: triage-g

------- Comment From <email address hidden> 2018-03-23 00:32 EDT-------
(In reply to comment #5)
> Looks like we need a SMT=off and
> /sys/module/kvm_hv/parameters/indep_threads_mode =N workaround still in
> Power9 DD2.2, on ubuntu18.04 kernel,
>
> # uname -a
> Linux ltc-bostonxx 4.15.0-12-generic #13-Ubuntu SMP Wed Mar 7 21:37:03 UTC
> 2018 ppc64le ppc64le ppc64le GNU/Linux
>
> Paul,
>
> Pls let us know which patches to be included ?

You need:

6964e6a4e489 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded", 2018-01-11)

00608e1f007e ("KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2", 2018-01-11)

cda4a1473313 ("KVM: PPC: Book3S HV: Fix duplication of host SLB entries", 2018-03-22)

The last one is in my kvm-ppc-fixes branch on kernel.org but not in the KVM tree or Linus's tree yet.

tags: added: bugnameltc-165741 severity-critical
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-25 21:52 EDT-------
@Praveen: please confirm with latest Ubu 18.04 kernel or "-proposed" kernel this is fixed?

This is critical for us to update the bug with latest info.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-26 02:10 EDT-------
Hi
I tried this on -- proposed as (4.15.0-13-generic) still facing same issue .

LOG:
root@system ~# uname -a
Linux system 4.15.0-13-generic #14-Ubuntu SMP Sat Mar 17 13:43:15 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@system:~# virsh list
Id Name State
----------------------------------------------------

root@system:~# virsh list --all
Id Name State
----------------------------------------------------
- hari-ubuntu-18.04 shut off
- harish-bionic shut off
- kal-1404.5_vm1 shut off
- kal-16.04.4_vm1 shut off
- kal-bionic_vm1 shut off
- kal-bionic_vm2 shut off
- PRA-1404.5_vm1 shut off
- PRA-bionic_vm1 shut off
- preethi1-ubuntu-18.04 shut off
- pt1-Ubuntu1804 shut off
- rp0-ubuntu-18.04 shut off
- rp2-ubuntu-18.04 shut off

root@system:~# virsh edit PRA-bionic_vm1
Domain PRA-bionic_vm1 XML configuration not changed.

root@system:~# virsh start PRA-bionic_vm1 --console
Domain PRA-bionic_vm1 started
Connected to domain PRA-bionic_vm1
Escape character is ^]

Regards
Praveen

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-26 08:19 EDT-------
Canonical,

Please check comment from Paul M and verify this patch is in-corporated into Bionic -proposed version.
.
-------------
6964e6a4e489 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded", 2018-01-11)

00608e1f007e ("KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2", 2018-01-11)

cda4a1473313 ("KVM: PPC: Book3S HV: Fix duplication of host SLB entries", 2018-03-22)

The last one is in my kvm-ppc-fixes branch on kernel.org but not in the KVM tree or Linus's tree yet.
----------

We have tested with proposed version and still facing the same issue.

-- proposed as (4.15.0-13-generic) still facing same issue .

LOG:
root@system ~# uname -a
Linux system 4.15.0-13-generic #14-Ubuntu SMP Sat Mar 17 13:43:15 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Manoj Iyer (manjo) on 2018-03-26
Changed in ubuntu-power-systems:
status: Triaged → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-04 05:18 EDT-------
Compat mode fixed on latest ubuntu build:

On Host:

# ppc64_cpu --smt
SMT=4
# cat /sys/module/kvm_hv/parameters/indep_threads_mode
Y
# uname -a
Linux ltc-boston114 4.15.0-14-generic #15-Ubuntu SMP Mon Apr 2 19:47:43 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

In guest:
root@ubuntu:~# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Model: 2.2 (pvr 004e 1202)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 32K
L1i cache: 32K
NUMA node0 CPU(s): 0,1

Launchpad Janitor (janitor) wrote :
Download full text (40.4 KiB)

This bug was fixed in the package linux - 4.15.0-15.16

---------------
linux (4.15.0-15.16) bionic; urgency=medium

  * linux: 4.15.0-15.16 -proposed tracker (LP: #1761177)

  * FFe: Enable configuring resume offset via sysfs (LP: #1760106)
    - PM / hibernate: Make passing hibernate offsets more friendly

  * /dev/bcache/by-uuid links not created after reboot (LP: #1729145)
    - SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent

  * Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine
    type(pseries-bionic) complaining "KVM implementation does not support
    Transactional Memory, try cap-htm=off" (kvm) (LP: #1752026)
    - powerpc: Use feature bit for RTC presence rather than timebase presence
    - powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
    - powerpc: Free up CPU feature bits on 64-bit machines
    - powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
    - powerpc/powernv: Provide a way to force a core into SMT4 mode
    - KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
    - KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
    - KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state

  * Important Kernel fixes to be backported for Power9 (kvm) (LP: #1758910)
    - powerpc/mm: Fixup tlbie vs store ordering issue on POWER9

  * Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
    namespaces (Bolt / NVMe) (LP: #1757497)
    - powerpc/64s: Fix lost pending interrupt due to race causing lost update to
      irq_happened

  * fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module
    failed to build (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

linux (4.15.0-14.15) bionic; urgency=medium

  * linux: 4.15.0-14.15 -proposed tracker (LP: #1760678)

  * [Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor
    (LP: #1758662)
    - net/mlx4_en: Change default QoS settings

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
    (LP: #1759312)
    - powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * Bionic update to 4.15.15 stable release (LP: #1760585)
    - net: dsa: Fix dsa_is_user_port() test inversion
    - openvswitch: meter: fix the incorrect calculation of max delta_t
    - qed: Fix MPA unalign flow in case header is split across two packets.
    - tcp: purge write queue upon aborting the connection
    - qed: Fix non TCP packets should be dropped on iWARP ll2 connection
    - sysfs: symlink: export sysfs_create_link_nowarn()
    - net: phy: relax error checking when creating sysfs link netdev->phydev
    - devlink: Remove redundant free on error path
    - macvlan: filter out unsupported feature flags
    - net: ipv6: keep sk status consistent after datagram connect failure
    - ipv6: old_dport should be a __be16 in __ip6_datagram_connect()
    - ipv6: sr: fix NULL pointer dereference when setting encap source address
    - ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
    - mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic
    - net: phy: Tell caller result ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Manoj Iyer (manjo) on 2018-04-16
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers