[SRU Justification]

ppc64le system OOPSes shortly after boot when KVM guests are started.

Cherry-pick patch e47057151422a67ce08747176fa21cb3b526a2c9

Tested at IBM - boot a machine with a KVM guest configured to start at boot. Without this patch, observe OOPS, with this patch, observe no OOPS.

[Regression Potential]
Patch is contained in arch/powerpc; so regression potential limited to that arch. Patch accepted to kernel stable trees, suggesting others also believe it to be of low risk.

[Original Report]

[ 0.000000] Linux version 4.4.0-93-generic (buildd@bos01-ppc64el-025) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #116-Ubuntu SMP Fri Aug 11 16:30:16 UTC 2017 (Ubuntu 4.4.0-93.116-generic 4.4.79)

[ 380.184554] KVM guest htab at c000007999000000 (order 29), LPID 2
[ 380.527576] Facility 'TM' unavailable, exception at 0xd00000003aad7f10, MSR=9000000000009033
[ 380.527717] Oops: Unexpected facility unavailable exception, sig: 6 [#2]
[ 380.527775] SMP NR_CPUS=2048 NUMA PowerNV
[ 380.527823] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter overlay binfmt_misc bridge stp llc kvm_hv uio_pdrv_genirq uio leds_powernv ipmi_powernv ibmpowernv vmx_crypto powernv_rng ipmi_msghandler kvm_pr kvm autofs4 xfs btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 raid10 ses enclosure mlx4_en be2net lpfc vxlan mlx4_core scsi_transport_fc ip6_udp_tunnel udp_tunnel ipr
[ 380.528781] CPU: 24 PID: 4277 Comm: qemu-system-ppc Tainted: G D 4.4.0-93-generic #116-Ubuntu
[ 380.528861] task: c000000003c389b0 ti: c000001fb2428000 task.ti: c000001fb2428000
[ 380.528929] NIP: d00000003aad7f10 LR: d000000037d52a14 CTR: d00000003aad7e40
[ 380.528997] REGS: c000001fb242b7b0 TRAP: 0f60 Tainted: G D (4.4.0-93-generic)
[ 380.529076] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22024848 XER: 00000000
[ 380.529247] CFAR: d00000003aad7ea4 SOFTE: 1
               GPR00: d000000037d52a14 c000001fb242ba30 d00000003aaec018 c000001fdbf60000
               GPR04: c000001f85800000 c000001fb242bbc0 0000000000000000 0000000000000000
               GPR08: 0000000000000001 c000000003c389b0 0000000000000001 d000000037d578f8
               GPR12: d00000003aad7e40 c00000000fb4e400 0000000000000000 000000000000001f
               GPR16: 00003fff72060000 0000000000800000 00003fff892c4390 00003fff7285f200
               GPR20: 0000010009988430 00000100099affd0 00003fff7285eb60 00000000100c1ff0
               GPR24: 00003ffffbcf4e10 00003fff72040028 0000000000000000 c000001fdbf60000
               GPR28: 0000000000000000 c000001f85800000 c000001fdbf60000 c000001f85800000
[ 380.530119] NIP [d00000003aad7f10] kvmppc_vcpu_run_hv+0xd0/0xff0 [kvm_hv]
[ 380.530188] LR [d000000037d52a14] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 380.530245] Call Trace:
[ 380.530270] [c000001fb242ba30] [c000001fb242bab0] 0xc000001fb242bab0 (unreliable)
[ 380.530353] [c000001fb242bb70] [d000000037d52a14] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 380.530436] [c000001fb242bba0] [d000000037d4f674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[ 380.530519] [c000001fb242bbe0] [d000000037d43918] kvm_vcpu_ioctl+0x528/0x7b0 [kvm]
[ 380.530602] [c000001fb242bd40] [c0000000002fff60] do_vfs_ioctl+0x480/0x7d0
[ 380.530671] [c000001fb242bde0] [c000000000300384] SyS_ioctl+0xd4/0xf0
[ 380.530742] [c000001fb242be30] [c000000000009204] system_call+0x38/0xb4
[ 380.530837] Instruction dump:
[ 380.530904] e92d02a0 e9290a50 e9290108 792a07e3 41820058 e92d02a0 e9290a50 e9290108
[ 380.531126] 7927e8a4 78e71f87 40820ed8 e92d02a0 <7d4022a6> f9490ee8 e92d02a0 7d4122a6
[ 380.531350] ---[ end trace 8f9b3b82f9a07d76 ]---

Needs kernel patch e47057151422a67ce08747176fa21cb3b526a2c9 according to Cyril

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-93-generic 4.4.0-93.116
ProcVersionSignature: Ubuntu 4.4.0-93.116-generic 4.4.79
Uname: Linux 4.4.0-93-generic ppc64le
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 1 15:03 seq
 crw-rw---- 1 root audio 116, 33 Sep 1 15:03 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Fri Sep 1 15:34:14 2017
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
 Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] failed with exit code 1: Hint: You are currently not seeing messages from other users and the system.
       Users in the 'systemd-journal' group can see all messages. Pass -q to
       turn off this notice.
 No journal files were opened due to insufficient permissions.
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub


ProcKernelCmdLine: root=UUID=c3fd6cdf-331d-4c9a-9a51-efabe9deea59 ro splash quiet
ProcLoadAvg: 0.19 0.09 0.09 1/1800 5980
 1: POSIX ADVISORY WRITE 1772 00:13:416 0 EOF
 2: POSIX ADVISORY WRITE 4082 00:13:685 0 0
 3: FLOCK ADVISORY WRITE 2762 00:13:631 0 EOF
 4: POSIX ADVISORY WRITE 3080 00:13:622 0 0
 5: FLOCK ADVISORY WRITE 3084 09:01:1074286601 0 EOF
ProcSwaps: Filename Type Size Used Priority
ProcVersion: Linux version 4.4.0-93-generic (buildd@bos01-ppc64el-025) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #116-Ubuntu SMP Fri Aug 11 16:30:16 UTC 2017
 linux-restricted-modules-4.4.0-93-generic N/A
 linux-backports-modules-4.4.0-93-generic N/A
 linux-firmware 1.157.11
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)

cpu_cores: Number of cores present = 24
cpu_coreson: Number of cores online = 24
cpu_smt: SMT is off

Daniel Black (daniel-black) wrote :
Daniel Black (daniel-black) wrote :
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Daniel Black (daniel-black) wrote :

Introduced 4.4.0-88.111 "KVM: PPC: Book3S HV: Preserve userspace HTM state properly" according to changelog.

Daniel Axtens (daxtens)
Changed in linux (Ubuntu):
assignee: nobody → Daniel Axtens (daxtens)
Daniel Black (daniel-black) wrote :

danielgb@p87:~$ apt-get source linux-image-4.4.0-93-generic
danielgb@p87:~$ cd linux-4.4.0/
danielgb@p87:~/linux-4.4.0$ patch -p1 < ../index.html\?id\=e47057151422a67ce08747176fa21cb3b526a2c9
checking file arch/powerpc/kvm/book3s_hv.c
Hunk #1 succeeded at 2708 (offset -503 lines).
danielgb@p87:~/linux-4.4.0$ fakeroot debian/rules clean
danielgb@p87:~/linux-4.4.0$ AUTOBUILD=1 fakeroot debian/rules binary-debs
danielgb@p87:~/linux-4.4.0$ ls ../*deb
../linux-headers-4.4.0-93-generic_4.4.0-93.116_ppc64el.deb ../linux-image-extra-4.4.0-93-generic_4.4.0-93.116_ppc64el.deb ../linux-tools-4.4.0-93-generic_4.4.0-93.116_ppc64el.deb
../linux-image-4.4.0-93-generic_4.4.0-93.116_ppc64el.deb ../linux-tools-4.4.0-93_4.4.0-93.116_ppc64el.deb
danielgb@p87:~/linux-4.4.0$ sudo dpkg -i ../*deb
danielgb@p87:~/linux-4.4.0$ sudo reboot

[ 0.000000] Linux version 4.4.0-93-generic (root@p87) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #116 SMP Mon Sep 4 10:00:39 AEST 2017 (Ubuntu 4.4.0-93.116-generic 4.4.79)

root@p87:~# dmesg | grep -i kvm
[ 259.411176] KVM guest htab at c000007959000000 (order 30), LPID 1
[ 288.766819] KVM guest htab at c000007999000000 (order 29), LPID 2

root@p87:~# dmesg | grep -i oops

Daniel Axtens (daxtens)
description: updated
description: updated
Stefan Bader (smb) wrote :

Was fixed in upstream stable 4.4.80 as "KVM: PPC: Book3S HV: Reload HTM registers explicitly". I will mark this bug as duplicate of the stable tracking bug for reference.

