KVM on 16.04.3 throws an error

Bug #1709784 reported by bugproxy on 2017-08-10
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Ubuntu on IBM Power Systems Bug Triage
Xenial
Critical
Joseph Salisbury
Zesty
Critical
Joseph Salisbury
qemu (Ubuntu)
Undecided
Unassigned

Bug Description

Problem Description
====================
KVM on Ubuntu 16.04.3 throws an error when used

---uname output---
Linux bastion-1 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:37:08 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 8348-21C Habanero

---Steps to Reproduce---
 Install 16.04.3

install KVM like:

apt-get install libvirt-bin qemu qemu-slof qemu-system qemu-utils

then exit and log back in so virsh will work without sudo

then run my spawn script

$ cat spawn.sh
#!/bin/bash

img=$1
qemu-system-ppc64 \
-machine pseries,accel=kvm,usb=off -cpu host -m 512 \
-display none -nographic \
-net nic -net user \
-drive "file=$img"

with a freshly downloaded ubuntu cloud image

sudo ./spawn.sh xenial-server-cloudimg-ppc64el-disk1.img

And I get nothing on the output.

and errors in dmesg

ubuntu@bastion-1:~$ [ 340.180295] Facility 'TM' unavailable, exception at 0xd0000000148b7f10, MSR=9000000000009033
[ 340.180399] Oops: Unexpected facility unavailable exception, sig: 6 [#1]
[ 340.180513] SMP NR_CPUS=2048 NUMA PowerNV
[ 340.180547] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm binfmt_misc joydev input_leds mac_hid opal_prd ofpart cmdlinepart powernv_flash ipmi_powernv ipmi_msghandler mtd at24 uio_pdrv_genirq uio ibmpowernv powernv_rng vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear mlx4_en hid_generic usbhid hid uas usb_storage ast i2c_algo_bit bnx2x ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx4_core drm ahci vxlan libahci ip6_udp_tunnel udp_tunnel mdio libcrc32c
[ 340.181331] CPU: 46 PID: 5252 Comm: qemu-system-ppc Not tainted 4.4.0-89-generic #112-Ubuntu
[ 340.181382] task: c000001e34c30b50 ti: c000001e34ce4000 task.ti: c000001e34ce4000
[ 340.181432] NIP: d0000000148b7f10 LR: d000000014822a14 CTR: d0000000148b7e40
[ 340.181475] REGS: c000001e34ce77b0 TRAP: 0f60 Not tainted (4.4.0-89-generic)
[ 340.181519] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22024848 XER: 00000000
[ 340.181629] CFAR: d0000000148b7ea4 SOFTE: 1
GPR00: d000000014822a14 c000001e34ce7a30 d0000000148cc018 c000001e37bc0000
GPR04: c000001db9ac0000 c000001e34ce7bc0 0000000000000000 0000000000000000
GPR08: 0000000000000001 c000001e34c30b50 0000000000000001 d0000000148278f8
GPR12: d0000000148b7e40 c00000000fb5b500 0000000000000000 000000000000001f
GPR16: 00003fff91c30000 0000000000800000 00003fffa8e34390 00003fff9242f200
GPR20: 00003fff92430010 000001001de5c030 00003fff9242eb60 00000000100c1ff0
GPR24: 00003fffc91fe990 00003fff91c10028 0000000000000000 c000001e37bc0000
GPR28: 0000000000000000 c000001db9ac0000 c000001e37bc0000 c000001db9ac0000
[ 340.182315] NIP [d0000000148b7f10] kvmppc_vcpu_run_hv+0xd0/0xff0 [kvm_hv]
[ 340.182357] LR [d000000014822a14] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 340.182394] Call Trace:
[ 340.182413] [c000001e34ce7a30] [c000001e34ce7ab0] 0xc000001e34ce7ab0 (unreliable)
[ 340.182468] [c000001e34ce7b70] [d000000014822a14] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 340.182522] [c000001e34ce7ba0] [d00000001481f674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[ 340.182581] [c000001e34ce7be0] [d000000014813918] kvm_vcpu_ioctl+0x528/0x7b0 [kvm]
[ 340.182634] [c000001e34ce7d40] [c0000000002fffa0] do_vfs_ioctl+0x480/0x7d0
[ 340.182678] [c000001e34ce7de0] [c0000000003003c4] SyS_ioctl+0xd4/0xf0
[ 340.182723] [c000001e34ce7e30] [c000000000009204] system_call+0x38/0xb4
[ 340.182766] Instruction dump:
[ 340.182788] e92d02a0 e9290a50 e9290108 792a07e3 41820058 e92d02a0 e9290a50 e9290108
[ 340.182863] 7927e8a4 78e71f87 40820ed8 e92d02a0 <7d4022a6> f9490ee8 e92d02a0 7d4122a6
[ 340.182938] ---[ end trace bc5080cb7d18f102 ]---
[ 340.276202]

This was with the latest ubuntu cloud image. I get the same thing when trying to use virt-install with an ISO image.

I have no way of loading a KVM on 16.04.3

== Comment: #2 - Jason M. Furmanek <email address hidden> - 2017-08-09 17:42:34 ==
I reinstalled with the HWE kernel (4.10).
I can install VM and see the console and eveything seems fine.

== Comment: #3 - Jason M. Furmanek <email address hidden> - 2017-08-09 17:44:03 ==
I had another system at 16.04.2 (4.4) and updated that one to the latest and it hit the same issue as above.
No qemu or libvirt updates were applied. Just kernel updates and a handful of other stuff.

Seems this issue is specific to the latest kernel

old version worked:
Linux fs7 4.4.0-83-generic #106

new version did not:
Linux fs7 4.4.0-89-generic #112-Ubuntu

== Comment: #4 - Gustavo Bueno Romero <email address hidden> - 2017-08-09 20:26:42 ==
Looks like 46a704f8409f79fd66567ad3f8a7304830a84293 was backported on 88 but e47057151422a67ce08747176fa21cb3b526a2c9 was not:

[gromero@localhost ubuntu-xenial]$ git remote -vv
origin git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git (fetch)
origin git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git (push)

[gromero@localhost ubuntu-xenial]$ git log Ubuntu-4.4.0-83.106..Ubuntu-4.4.0-89.112 --oneline | fgrep "Preserve userspace HTM state properly"
a97e978 KVM: PPC: Book3S HV: Preserve userspace HTM state properly
[gromero@localhost ubuntu-xenial]$ git log Ubuntu-4.4.0-83.106..Ubuntu-4.4.0-89.112 --oneline | fgrep "Enable TM before accessing TM registers"
[gromero@localhost ubuntu-xenial]$ git tag --contains a97e978574f41ffcf1813c180aba2772d46fbb5b
Ubuntu-4.4.0-88.111
Ubuntu-4.4.0-89.112
Ubuntu-raspi2-4.4.0-1066.74
Ubuntu-raspi2-4.4.0-1067.75
Ubuntu-snapdragon-4.4.0-1068.73
Ubuntu-snapdragon-4.4.0-1069.74

So
https://github.com/torvalds/linux/commit/46a704f8409f79fd66567ad3f8a7304830a84293 is present in ISO but https://github.com/torvalds/linux/commit/e47057151422a67ce08747176fa21cb3b526a2c9 is not.

bugproxy (bugproxy) on 2017-08-10
tags: added: architecture-ppc64le bugnameltc-157436 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
no longer affects: qemu
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Server Team (canonical-server)

Hi,
there was a related case in bug 1664622.

The TL;DR of this was:
1. even with all qemu patches identified by IBM and upstream while working on this the issue still showed up.
2. newer kernels (>=4.8, but also newer 4.4 kernels than those on the release itself worked - that matches the report here)
3. the conclusion was that nothing can/has-to be done for qemu [1]
4. there was no way to fix the initial 16.04.0 iso, but that was considered ok since 16.04.2 worked

Mapping to this bug here it seems that a further update to 4.4 might have broken it "again" as identified by Gustavo in in this report.

I added all this to complete the context for everybody else, but for now qemu stays at won't fix for the reasons and assumptions haven't changed since our last check on this issue.

On the kernel surely if the identified two patches really need to go together a fix might be needed, but that is for the kernel Team to evaluate.

[1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1664622/comments/27

Changed in qemu (Ubuntu):
status: New → Won't Fix
Andrew Cloke (andrew-cloke) wrote :

Thanks Christian. Reassigning to kernel team.

Changed in ubuntu-power-systems:
assignee: Canonical Server Team (canonical-server) → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo) on 2017-08-10
Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: New → In Progress
Changed in qemu (Ubuntu Xenial):
status: New → Won't Fix
no longer affects: qemu (Ubuntu Zesty)
no longer affects: qemu (Ubuntu Xenial)
Changed in linux (Ubuntu Xenial):
importance: Undecided → Critical
Changed in linux (Ubuntu Zesty):
importance: Undecided → Critical
Changed in linux (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I built test kernels with a pick of commit e47057151422. The test kernels can be downloaded from:

Xenial: http://kernel.ubuntu.com/~jsalisbury/lp1709784/xenial/
Zesty: http://kernel.ubuntu.com/~jsalisbury/lp1709784/zesty/

Can you test these kernels and see if they resolve this bug?

Thanks in advance!

Changed in ubuntu-power-systems:
status: New → In Progress
Breno Leitão (breno-leitao) wrote :

I just tested zesty kernel and it worked fine:

# uname -a
Linux sid 4.10.0-30-generic #34~lp1709784 SMP Thu Aug 10 20:01:17 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

# sudo virsh list
 Id Name State
----------------------------------------------------
 1 unstable running
 2 debian running
 4 1604 running

# sudo virsh console 1604
Connected to domain 1604
Escape character is ^]

Ubuntu 16.04.3 LTS 1604 hvc0

1604 login:

Breno Leitão (breno-leitao) wrote :

I also tested xenial GA kernel:

sid ➜ ~ sudo uname -a
Linux sid 4.4.0-89-generic #112~lp1709784 SMP Thu Aug 10 19:39:43 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

sid ➜ ~ sudo virsh list
 Id Name State
----------------------------------------------------
 2 debian running
 3 1604 running
 4 unstable running

sid ➜ ~ sudo virsh console 1604
Connected to domain 1604
Escape character is ^]

Ubuntu 16.04.3 LTS 1604 hvc0

1604 login:

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing. I'll submit SRU requests for both Zesty and Xenial.

Changed in linux (Ubuntu Zesty):
status: In Progress → Invalid
Joseph Salisbury (jsalisbury) wrote :

There is commit 17d381054b1d6f4adc3db623b2066fff41b4dc1a ("KVM: PPC: Book3S HV: Reload HTM registers explicitly") from
v4.4.80 series that is in Xenial master-next. I built a master-next test kernel, which can be downloaded from:

Xenial: http://kernel.ubuntu.com/~jsalisbury/lp1709784/xenial/

Can you test these kernels and see if they resolve this bug?

Thanks in advance!

Amartey Pearson (apearson) wrote :

I ran into this issue as well, and confirmed with the kernel built above (http://kernel.ubuntu.com/~jsalisbury/lp1709784/xenial/) that things now work again. Thanks!

------- Comment From <email address hidden> 2017-09-01 00:40 EDT-------
I tested the test kernel made available by jsalisbury and it worked for me as well.

This is a pretty big blocker - very much hoping this makes it into a new kernel soon.

Joseph Salisbury (jsalisbury) wrote :

@furmanek, The fix is in the Ubuntu-4.4.0-94 kernel, which is the current -proposed kernel.

Would it be possible for you to test the proposed kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-04 19:23 EDT-------
Hi jsalisbury,

linux-image-4.4.0-94-generic from -proposed looks fine to me. VMs boot ok and as expected no TRAP 0xF60 (Facility Unavailable) in dmesg:

root@sid:~# apt-get -t xenial-proposed install linux-image-4.4.0-94-generic
<REBOOT>
root@sid:~# uname -a
Linux sid 4.4.0-94-generic #117-Ubuntu SMP Tue Aug 29 08:13:18 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@sid:/# dmesg | fgrep 0f60
root@sid:/# virsh list
Id Name State
----------------------------------------------------
1 debian running
2 1604 running
4 unstable running

root@sid:/# virsh console 2
Connected to domain 1604
Escape character is ^]

Ubuntu 16.04.3 LTS 1604 hvc0

1604 login: ubuntu
Password:
Last login: Mon Aug 14 15:41:29 EDT 2017 from 10.1.0.1 on pts/0
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-30-generic ppc64le)

* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
You have new mail.
1604 ? ~

I understand that no further action from IBM is necessary now. Please, let me know if it's not true.

BTW, do you know when this will likely reach the SRU?

Thanks a lot and best regards,
Gustavo

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-04 20:46 EDT-------
Only to let you know that I just tested this version successfully as well.

Thanks!

Jose Ziviani

There seems to be an update-regression in Artful.

Booting a recent Artful guest now again also throws:
[485006.367929] Facility 'TM' unavailable, exception at 0xd000000021617f10, MSR=9000000000009033
[485006.367943] Oops: Unexpected facility unavailable exception, sig: 6 [#14]
[485006.367945] SMP NR_CPUS=2048 NUMA PowerNV

Booting a Zesty guest triggers the same.

Guest kernel(s): 4.12.0.13.14 / 4.10.0.33.33
Host kernel: 4.4.0-93-generic

I thought this only triggers if the guest kernel versions is a bad one.
It seems the host kernel (-93 as shown above) also needing to upgrade to avoid the issue?
Ok that makes it worse than I thought - can't reboot to test from proposed atm.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-12 04:23 EDT-------
*** Bug 158561 has been marked as a duplicate of this bug. ***

Ryan Beisner (1chb1n) wrote :

Affects OpenStack on ppc64el. Marking other bug as duplicate (it has logs/attachments fyi).
 https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1716469

Ryan Beisner (1chb1n) wrote :

Confirmed workaround for OpenStack Ocata on Xenial:

### Nova compute host [hwe-edge kernel via MAAS]
ubuntu@node-mawhile:~$ uname -a
Linux node-mawhile 4.11.0-14-generic #20~16.04.1-Ubuntu SMP Wed Aug 9 09:06:18 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

### Nova guest [stock xenial ppc64el cloud image]
[ 0.000000] Linux version 4.4.0-65-generic (buildd@bos01-ppc64el-028) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:48:50 UTC 2017 (Ubuntu 4.4.0-65.86-gene
ric 4.4.49)

Breno Leitão (breno-leitao) wrote :

Christian,

I am looking at 4.4 git repository and version -93 contains exactly the problem described here, it contains the commit# 46a704f8409f79fd66567ad3f8a7304830a84293 (as a97e978574f41ffcf1813c180aba2772d46fbb5b), but it does not include commit# e47057151422a67ce08747176fa21cb3b526a2c9.

That said, we probably want to include e47057151422a67ce08747176fa21cb3b526a2c9 into 4.4 series as soon as possible.

bugproxy (bugproxy) on 2017-09-13
tags: added: targetmilestone-inin16043
removed: targetmilestone-inin---
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-19 16:53 EDT-------
I've tested the new kernel 4.4.0-96 and it fixes this issue.

Thank you!

Joseph Salisbury (jsalisbury) wrote :

@furmanek, Thanks for testing.

@Breno, Commit e47057151 was not applied to Xenial. It did not apply cleanly because the following commit is already in Xenial as of 4.4.0-96(As Xenial SHA1 25e0720):

17d381054b1d6f4adc3db623b2066fff41b4dc1a
("KVM: PPC: Book3S HV: Reload HTM registers explicitly")

Commit 17d381054 came in from the 4.4.80 updates and does what commit e47057151 does and more. Per comment #18, it sounds like it fixes this bug. It would be great if others affected by this bug could also test -proposed.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-20 08:53 EDT-------
> @Breno, Commit e47057151 was not applied to Xenial. It did not apply
> cleanly because the following commit is already in Xenial as of 4.4.0-96(As
> Xenial SHA1 25e0720):
>
> 17d381054b1d6f4adc3db623b2066fff41b4dc1a
> ("KVM: PPC: Book3S HV: Reload HTM registers explicitly")
>
> Commit 17d381054 came in from the 4.4.80 updates and does what commit
> e47057151 does and more. Per comment #18, it sounds like it fixes this bug.
> It would be great if others affected by this bug could also test -proposed.

Sure. We will let you know if we hit this issue on kernel 4.4.0-96+.

Thanks

Andrew Cloke (andrew-cloke) wrote :

Based on comments above, moving to "Fix Released". If this issue re-occurs, please update with a new comment.

Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Released → Fix Committed
Andrew Cloke (andrew-cloke) wrote :

As this ticket only relates to Xenial and 4.4 kernel, marking as "Fix Released".

Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers