Ubuntu17.04: Kernel Oops: Exception in kernel mode, sig: 5 [#1] during Avocado KVM Test runs [Regression]

Bug #1680390 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
New
Undecided
Unassigned
linux (Ubuntu)
New
Undecided
Taco Screen team

Bug Description

== Comment: #0 - Satheesh Rajendran <email address hidden> - 2017-03-27 12:30:45 ==
---Problem Description---
Kernel hit with oops while running avocado(kvm) tests "Oops: Exception in kernel mode, sig: 5 [#1]"

Contact Information = <email address hidden>

---uname output---
Linux ltc-test-ci1 4.10.0-14-generic #16-Ubuntu SMP Fri Mar 17 15:19:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = power 8 ppc64le

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. Run Avocado(kvm) tests
#git clone git://git.linux.ibm.com/ltc-test/avocado-fvt-wrapper.git;cd avocado-fvt-wrapper
#python avocado-setup.py --bootstrap --run-suite guest_cpu --guest-os Ubuntu.16.04.2.ppc64le --only-filter virtio_scsi virtio_net qcow2

2. After sometime the below mentioned traces were seen.

Stack trace output:
 [20751.909458] ------------[ cut here ]------------
[20751.909461] kernel BUG at /build/linux-HLNhAK/linux-4.10.0/include/linux/swapops.h:129!
[20751.909542] Oops: Exception in kernel mode, sig: 5 [#1]
[20751.909549] SMP NR_CPUS=2048
[20751.909549] NUMA
[20751.909555] PowerNV
[20751.909583] Modules linked in: vhost_net macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 kvm_hv kvm_pr kvm tcm_fc libfc usb_f_tcm tcm_usb_gadget libcomposite udc_core tcm_qla2xxx qla2xxx scsi_transport_fc ib_srpt iscsi_target_mod tcm_loop vhost_scsi vhost target_core_user target_core_file target_core_iblock target_core_pscsi target_core_mod ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6t_REJECT nf_reject_ipv6 xt_conntrack ip6t_rpfilter ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_security ip6table_mangle ip6table_raw ip6table_nat iptable_security iptable_mangle iptable_raw iptable_nat ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack
[20751.910225] binfmt_misc powernv_rng powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq uio leds_powernv vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace configfs iscsi_tcp libiscsi_tcp sunrpc libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas crc32c_vpmsum tg3 ipr
[20751.910629] CPU: 24 PID: 6926 Comm: CPU 24/KVM Not tainted 4.10.0-14-generic #16-Ubuntu
[20751.910700] task: c0000007b29ac000 task.stack: c0000007f07b8000
[20751.910759] NIP: c00000000030d748 LR: c00000000030d658 CTR: 0000000000000000
[20751.910828] REGS: c0000007f07bb3b0 TRAP: 0700 Not tainted (4.10.0-14-generic)
[20751.910897] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
[20751.910903] CR: 44882882 XER: 00000000
[20751.910984] CFAR: c00000000030d884 SOFTE: 1
               GPR00: c00000000030d658 c0000007f07bb630 c00000000144c900 f000000001f9f1f0
               GPR04: c0000007e7c7e0e0 f000000001f9f1f0 000000001f001c61 00000000611c001f
               GPR08: c0000000015bc900 0000000000000001 0000000000000001 0000000000e0c7e7
               GPR12: 0000000000002200 c00000000fb8d800 0000000000000000 000000000007fe1d
               GPR16: 0000000000010000 00003fff121c0000 0000000000000000 0000000088000000
               GPR20: 0000000020000000 0000000088000000 0000000022000000 c000000004990000
               GPR24: c0000000015be3d8 c0000007f05d3700 c0000007b972c030 0000000000000000
               GPR28: c0000007f07bb710 3e000000000d611c f000000003584700 f000000001f9f1f0
[20751.911580] NIP [c00000000030d748] __migration_entry_wait+0x128/0x2a0
[20751.911639] LR [c00000000030d658] __migration_entry_wait+0x38/0x2a0
[20751.911697] Call Trace:
[20751.911722] [c0000007f07bb630] [c00000000030d658] __migration_entry_wait+0x38/0x2a0 (unreliable)
[20751.911807] [c0000007f07bb670] [c0000000002bbc6c] do_swap_page+0x73c/0x9a0
[20751.911866] [c0000007f07bb6f0] [c0000000002bfa98] handle_mm_fault+0xac8/0x1600
[20751.911937] [c0000007f07bb7e0] [c0000000002b4104] __get_user_pages+0x194/0x4e0
[20751.912008] [c0000007f07bb890] [c0000000002b47e4] get_user_pages_unlocked+0xf4/0x280
[20751.912079] [c0000007f07bb930] [c0000000002b59ac] get_user_pages_fast+0xac/0x100
[20751.912152] [c0000007f07bb980] [d00000000f66ca74] kvmppc_book3s_hv_page_fault+0x2bc/0xbb0 [kvm_hv]
[20751.912236] [c0000007f07bba70] [d00000000f6696f8] kvmppc_vcpu_run_hv+0xe60/0x1220 [kvm_hv]
[20751.912312] [c0000007f07bbb80] [d00000000f6131ac] kvmppc_vcpu_run+0x34/0x48 [kvm]
[20751.912387] [c0000007f07bbba0] [d00000000f61030c] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[20751.912462] [c0000007f07bbbe0] [d00000000f603db8] kvm_vcpu_ioctl+0x500/0x780 [kvm]
[20751.912534] [c0000007f07bbd40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
[20751.912594] [c0000007f07bbde0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
[20751.912654] [c0000007f07bbe30] [c00000000000b184] system_call+0x38/0xe0
[20751.912713] Instruction dump:
[20751.912749] 3d020017 79293448 39481a68 ebca0000 7fde4a14 e93e0020 712a0001 4082014c
[20751.912822] 7fc9f378 e9290000 7d2948f8 792907e0 <0b090000> 39400000 3bbe001c 39000001
[20751.912899] ---[ end trace 5eaae2f83c5daa20 ]---

System Dump Info:
  The system is not configured to capture a system dump.

== Comment: #3 - IRANNA D. ANKAD <email address hidden> - 2017-03-28 01:50:15 ==
This is a regression from 4.10.0-13 kernel and blocking our regression tetsing

== Comment: #10 - VIPIN K. PARASHAR <email address hidden> - 2017-03-30 03:44:13 ==

root@ltc-test-ci1:~# uname -a
Linux ltc-test-ci1 4.10.0-15-generic #17-Ubuntu SMP Fri Mar 24 17:50:37 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-test-ci1:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="17.04 (Zesty Zapus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu Zesty Zapus (development branch)"
VERSION_ID="17.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=zesty
UBUNTU_CODENAME=zesty
root@ltc-test-ci1:~# tail /proc/cpuinfo
processor : 79
cpu : POWER8E (raw), altivec supported
clock : 2061.000000MHz
revision : 2.1 (pvr 004b 0201)

timebase : 512000000
platform : PowerNV
model : 8247-21L
machine : PowerNV 8247-21L
firmware : OPAL
root@ltc-test-ci1:~#

== Comment: #11 - VIPIN K. PARASHAR <email address hidden> - 2017-03-30 03:58:04 ==
Mar 24 07:15:57
===========

[ 1955.041619] ------------[ cut here ]------------
[ 1955.041623] kernel BUG at /build/linux-HLNhAK/linux-4.10.0/include/linux/swapops.h:129!
[ 1955.041633] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1955.041637] SMP NR_CPUS=2048
[ 1955.041638] NUMA
[ 1955.041641] PowerNV
[ 1955.041645] Modules linked in: vhost_net macvtap macvlan rpcsec_gss_krb5 nfsv4 nfs fscache xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 kvm_hv kvm_pr kvm tcm_fc libfc usb_f_tcm tcm_usb_gadget libcomposite udc_core tcm_qla2xxx qla2xxx scsi_transport_fc ib_srpt iscsi_target_mod tcm_loop vhost_scsi vhost target_core_user target_core_file target_core_iblock target_core_pscsi target_core_mod ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_raw ip6table_security ip6table_mangle ip6table_nat iptable_raw iptable_security iptable_mangle iptable_nat ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
[ 1955.041724] nf_defrag_ipv6 nf_nat nf_conntrack binfmt_misc powernv_rng powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler leds_powernv uio_pdrv_genirq uio vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas crc32c_vpmsum tg3 ipr
[ 1955.041786] CPU: 40 PID: 7306 Comm: CPU 13/KVM Not tainted 4.10.0-14-generic #16-Ubuntu
[ 1955.041792] task: c0000007937b5a00 task.stack: c0000007f2a48000
[ 1955.041796] NIP: c00000000030d748 LR: c00000000030d658 CTR: 0000000000000000
[ 1955.041801] REGS: c0000007f2a4b3b0 TRAP: 0700 Not tainted (4.10.0-14-generic)
[ 1955.041805] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 1955.041813] CR: 44882882 XER: 00000000
[ 1955.041819] CFAR: c00000000030d884 SOFTE: 1
[ 1955.041819] GPR00: c00000000030d658 c0000007f2a4b630 c00000000144c900 f000000001fb8b30
[ 1955.041819] GPR04: c0000007ee2cd2c8 f000000001fb8b30 000000001f005977 000000007759001f
[ 1955.041819] GPR08: c0000000015bc900 0000000000000001 0000000000000001 0000000000d02cee
[ 1955.041819] GPR12: 0000000000002200 c00000000fb96800 0000000000000000 000000000000005a
[ 1955.041819] GPR16: 0000000000010000 00003ffe30590000 0000000000000000 0000000088000000
[ 1955.041819] GPR20: 0000000020000000 0000000088000000 0000000022000000 c0000000fd3d0000
[ 1955.041819] GPR24: c0000000015be3d8 c0000007c3c66880 c0000007ed9b40d8 0000000000000000
[ 1955.041819] GPR28: c0000007f2a4b710 3e00000000077759 f000000001ddd640 f000000001fb8b30
[ 1955.041874] NIP [c00000000030d748] __migration_entry_wait+0x128/0x2a0
[ 1955.041879] LR [c00000000030d658] __migration_entry_wait+0x38/0x2a0
[ 1955.041883] Call Trace:
[ 1955.041886] [c0000007f2a4b630] [c00000000030d658] __migration_entry_wait+0x38/0x2a0 (unreliable)
[ 1955.041894] [c0000007f2a4b670] [c0000000002bbc6c] do_swap_page+0x73c/0x9a0
[ 1955.041900] [c0000007f2a4b6f0] [c0000000002bfa98] handle_mm_fault+0xac8/0x1600
[ 1955.041906] [c0000007f2a4b7e0] [c0000000002b4104] __get_user_pages+0x194/0x4e0
[ 1955.041912] [c0000007f2a4b890] [c0000000002b47e4] get_user_pages_unlocked+0xf4/0x280
[ 1955.041918] [c0000007f2a4b930] [c0000000002b59ac] get_user_pages_fast+0xac/0x100
[ 1955.041927] [c0000007f2a4b980] [d00000000f7aca74] kvmppc_book3s_hv_page_fault+0x2bc/0xbb0 [kvm_hv]
[ 1955.041935] [c0000007f2a4ba70] [d00000000f7a96f8] kvmppc_vcpu_run_hv+0xe60/0x1220 [kvm_hv]
[ 1955.041947] [c0000007f2a4bb80] [d00000000f7531ac] kvmppc_vcpu_run+0x34/0x48 [kvm]
[ 1955.041958] [c0000007f2a4bba0] [d00000000f75030c] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[ 1955.041967] [c0000007f2a4bbe0] [d00000000f743db8] kvm_vcpu_ioctl+0x500/0x780 [kvm]
[ 1955.041974] [c0000007f2a4bd40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
[ 1955.041980] [c0000007f2a4bde0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
[ 1955.041986] [c0000007f2a4be30] [c00000000000b184] system_call+0x38/0xe0
[ 1955.041990] Instruction dump:
[ 1955.041994] 3d020017 79293448 39481a68 ebca0000 7fde4a14 e93e0020 712a0001 4082014c
[ 1955.042003] 7fc9f378 e9290000 7d2948f8 792907e0 <0b090000> 39400000 3bbe001c 39000001
[ 1955.042016] ---[ end trace 1c0e9a056f95491f ]---

Mar 27 11:54:27
============

[20751.909458] ------------[ cut here ]------------
[20751.909461] kernel BUG at /build/linux-HLNhAK/linux-4.10.0/include/linux/swapops.h:129!
[20751.909542] Oops: Exception in kernel mode, sig: 5 [#1]
[20751.909549] SMP NR_CPUS=2048
[20751.909549] NUMA
[20751.909555] PowerNV
[20751.909583] Modules linked in: vhost_net macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 kvm_hv kvm_pr kvm tcm_fc libfc usb_f_tcm tcm_usb_gadget libcomposite udc_core tcm_qla2xxx qla2xxx scsi_transport_fc ib_srpt iscsi_target_mod tcm_loop vhost_scsi vhost target_core_user target_core_file target_core_iblock target_core_pscsi target_core_mod ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6t_REJECT nf_reject_ipv6 xt_conntrack ip6t_rpfilter ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_security ip6table_mangle ip6table_raw ip6table_nat iptable_security iptable_mangle iptable_raw iptable_nat ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack
[20751.910225] binfmt_misc powernv_rng powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq uio leds_powernv vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace configfs iscsi_tcp libiscsi_tcp sunrpc libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas crc32c_vpmsum tg3 ipr
[20751.910629] CPU: 24 PID: 6926 Comm: CPU 24/KVM Not tainted 4.10.0-14-generic #16-Ubuntu
[20751.910700] task: c0000007b29ac000 task.stack: c0000007f07b8000
[20751.910759] NIP: c00000000030d748 LR: c00000000030d658 CTR: 0000000000000000
[20751.910828] REGS: c0000007f07bb3b0 TRAP: 0700 Not tainted (4.10.0-14-generic)
[20751.910897] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
[20751.910903] CR: 44882882 XER: 00000000
[20751.910984] CFAR: c00000000030d884 SOFTE: 1
[20751.910984] GPR00: c00000000030d658 c0000007f07bb630 c00000000144c900 f000000001f9f1f0
[20751.910984] GPR04: c0000007e7c7e0e0 f000000001f9f1f0 000000001f001c61 00000000611c001f
[20751.910984] GPR08: c0000000015bc900 0000000000000001 0000000000000001 0000000000e0c7e7
[20751.910984] GPR12: 0000000000002200 c00000000fb8d800 0000000000000000 000000000007fe1d
[20751.910984] GPR16: 0000000000010000 00003fff121c0000 0000000000000000 0000000088000000
[20751.910984] GPR20: 0000000020000000 0000000088000000 0000000022000000 c000000004990000
[20751.910984] GPR24: c0000000015be3d8 c0000007f05d3700 c0000007b972c030 0000000000000000
[20751.910984] GPR28: c0000007f07bb710 3e000000000d611c f000000003584700 f000000001f9f1f0
[20751.911580] NIP [c00000000030d748] __migration_entry_wait+0x128/0x2a0
[20751.911639] LR [c00000000030d658] __migration_entry_wait+0x38/0x2a0
[20751.911697] Call Trace:
[20751.911722] [c0000007f07bb630] [c00000000030d658] __migration_entry_wait+0x38/0x2a0 (unreliable)
[20751.911807] [c0000007f07bb670] [c0000000002bbc6c] do_swap_page+0x73c/0x9a0
[20751.911866] [c0000007f07bb6f0] [c0000000002bfa98] handle_mm_fault+0xac8/0x1600
[20751.911937] [c0000007f07bb7e0] [c0000000002b4104] __get_user_pages+0x194/0x4e0
[20751.912008] [c0000007f07bb890] [c0000000002b47e4] get_user_pages_unlocked+0xf4/0x280
[20751.912079] [c0000007f07bb930] [c0000000002b59ac] get_user_pages_fast+0xac/0x100
[20751.912152] [c0000007f07bb980] [d00000000f66ca74] kvmppc_book3s_hv_page_fault+0x2bc/0xbb0 [kvm_hv]
[20751.912236] [c0000007f07bba70] [d00000000f6696f8] kvmppc_vcpu_run_hv+0xe60/0x1220 [kvm_hv]
[20751.912312] [c0000007f07bbb80] [d00000000f6131ac] kvmppc_vcpu_run+0x34/0x48 [kvm]
[20751.912387] [c0000007f07bbba0] [d00000000f61030c] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[20751.912462] [c0000007f07bbbe0] [d00000000f603db8] kvm_vcpu_ioctl+0x500/0x780 [kvm]
[20751.912534] [c0000007f07bbd40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
[20751.912594] [c0000007f07bbde0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
[20751.912654] [c0000007f07bbe30] [c00000000000b184] system_call+0x38/0xe0
[20751.912713] Instruction dump:
[20751.912749] 3d020017 79293448 39481a68 ebca0000 7fde4a14 e93e0020 712a0001 4082014c
[20751.912822] 7fc9f378 e9290000 7d2948f8 792907e0 <0b090000> 39400000 3bbe001c 39000001
[20751.912899] ---[ end trace 5eaae2f83c5daa20 ]---

As pasted above, two instances of Oops are seen in kernel logs.

== Comment: #15 - VIPIN K. PARASHAR <email address hidden> - 2017-03-30 11:34:06 ==

From Linux source
============

/*
 * Something used the pte of a page under migration. We need to
 * get to the page and wait until migration is finished.
 * When we return from this function the fault will be retried.
 */
void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
                                spinlock_t *ptl)
{
..
..
       page = migration_entry_to_page(entry);

static inline struct page *migration_entry_to_page(swp_entry_t entry)
{
        struct page *p = pfn_to_page(swp_offset(entry));
        /*
         * Any use of migration entries may only occur while the
         * corresponding page is locked
         */
        BUG_ON(!PageLocked(p)); <------ Oops here
        return p;
}

Kernel Oops is getting invoked, due to hitting a BUG_ON in kernel, while
servicing KVM ioctl and subsequent page fault with pages being migrated.

Same issue has been noticed on intel as well

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1677057

Revision history for this message
bugproxy (bugproxy) wrote : sosreport - Host

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-152928 severity-critical targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
summary: - Ubuntu 17.04: Kernel Oops: Exception in kernel mode, sig: 5 [#1] during
+ Ubuntu17.04: Kernel Oops: Exception in kernel mode, sig: 5 [#1] during
Avocado KVM Test runs [Regression]
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Satheesh Rajendran - please try the kernel intended for release (Ubuntu-4.10.0-17.19) that is currently staged in -proposed. There have been multiple stable updates applied since 4.10.0-14. I'd also like to know if this is reproducible given that I've seen several bugs complaining of the swapops BUGON, though so far all of those bugs seem to have been related to the use of Firefox.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-04-06 13:36 EDT-------
(In reply to comment #21)
> Satheesh Rajendran - please try the kernel intended for release
> (Ubuntu-4.10.0-17.19) that is currently staged in -proposed. There have been
> multiple stable updates applied since 4.10.0-14. I'd also like to know if
> this is reproducible given that I've seen several bugs complaining of the
> swapops BUGON, though so far all of those bugs seem to have been related to
> the use of Firefox.

Updated to (- proposed) @ 4.10.0-18-generic, running the tests, will keep posted the results.

Regards,
-Satheesh

Revision history for this message
bugproxy (bugproxy) wrote : dmesg
Download full text (4.2 KiB)

------- Comment on attachment From <email address hidden> 2017-04-07 05:36 EDT-------

Am able to hit the issue with 4.10.0-18-generic
#uname -a
Linux xxx 4.10.0-18-generic #20-Ubuntu SMP Wed Apr 5 17:17:06 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

System becomes non-responsive, tried sosreport it could not and hit traces,

# sosreport
[45667.573198] INFO: rcu_sched self-detected stall on CPU
[45667.573273] 40-...: (5708015 ticks this GP) idle=fb9/140000000000001/0 softirq=427145/427152 fqs=2849867
[45667.573287] (t=5708055 jiffies g=269170 c=269169 q=847028)
[45667.573355] Task dump for CPU 32:
[45667.573393] numad R running task 0 2709 1 0x00042004
[45667.573467] Call Trace:
[45667.573494] [c000000002e17560] [c0000007edf388b8] 0xc0000007edf388b8 (unreliable)
[45667.573569] Task dump for CPU 40:
[45667.573607] CPU 0/KVM R running task 0 44171 1 0x00042004
[45667.573681] Call Trace:
[45667.573709] [c0000007e9ce2fc0] [c00000000011eb1c] sched_show_task+0xcc/0x150 (unreliable)
[45667.573784] [c0000007e9ce3030] [c000000000b5b798] rcu_dump_cpu_stacks+0xec/0x120
[45667.573859] [c0000007e9ce3080] [c00000000016d5b0] rcu_check_callbacks+0x930/0xb30
[45667.573935] [c0000007e9ce31b0] [c000000000176dd8] update_process_times+0x48/0x90
[45667.574010] [c0000007e9ce31e0] [c00000000018e2d0] tick_sched_handle.isra.7+0x30/0xb0
[45667.574085] [c0000007e9ce3210] [c00000000018e3b4] tick_sched_timer+0x64/0xd0
[45667.574160] [c0000007e9ce3250] [c0000000001779e4] __hrtimer_run_queues+0x124/0x420
[45667.574234] [c0000007e9ce32e0] [c000000000178918] hrtimer_interrupt+0xf8/0x330
[45667.574310] [c0000007e9ce33b0] [c000000000023e8c] __timer_interrupt+0x8c/0x270
[45667.574386] [c0000007e9ce3400] [c00000000002428c] timer_interrupt+0x9c/0xe0
[45667.574449] [c0000007e9ce3430] [c0000000000090a4] decrementer_common+0x114/0x120
[45667.574526] --- interrupt: 901 at _raw_spin_lock+0x74/0xe0
[45667.574526] LR = follow_page_pte+0x120/0x830
[45667.574625] [c0000007e9ce3720] [8010018000000000] 0x8010018000000000 (unreliable)
[45667.574701] [c0000007e9ce3750] [c0000000002b4250] follow_page_pte+0x120/0x830
[45667.574776] [c0000007e9ce37e0] [c0000000002b54fc] __get_user_pages+0x10c/0x4e0
[45667.574851] [c0000007e9ce3890] [c0000000002b5c64] get_user_pages_unlocked+0xf4/0x280
[45667.574927] [c0000007e9ce3930] [c0000000002b6e2c] get_user_pages_fast+0xac/0x100
[45667.575004] [c0000007e9ce3980] [d00000000f62d074] kvmppc_book3s_hv_page_fault+0x2bc/0xbc0 [kvm_hv]
[45667.575093] [c0000007e9ce3a70] [d00000000f629a30] kvmppc_vcpu_run_hv+0xbc8/0x1220 [kvm_hv]
[45667.575174] [c0000007e9ce3b80] [d00000000f5932bc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[45667.575254] [c0000007e9ce3ba0] [d00000000f59036c] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[45667.575333] [c0000007e9ce3be0] [d00000000f583db8] kvm_vcpu_ioctl+0x500/0x780 [kvm]
[45667.575409] [c0000007e9ce3d40] [c00000000035c674] do_vfs_ioctl+0xd4/0x8c0
[45667.575473] [c0000007e9ce3de0] [c00000000035cf34] SyS_ioctl+0xd4/0xf0
[45667.575537] [c0000007e9ce3e30] [c00000000000b184] system_call+0x38/0xe0
[45667.575599] Task dump for CPU 48:
[45667.575636] CPU 28/KVM R running task 0 44199 1 0x...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (5.1 KiB)

------- Comment From <email address hidden> 2017-04-10 00:00 EDT-------
Did try with kdump enabled and updated kernel, still hitting at issue

# uname -a
Linux ltc-test-ci1 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

# [11827.612620] kernel BUG at /build/linux-mYrikn/linux-4.10.0/include/linux/swapops.h:129!
[11827.612748] Oops: Exception in kernel mode, sig: 5 [#1]
[11827.612796] SMP NR_CPUS=2048
[11827.612797] NUMA
[11827.612832] PowerNV
[11827.612881] Modules linked in: vhost_net macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 kvm_hv kvm_pr kvm tcm_fc libfc usb_f_tcm tcm_usb_gadget libcomposite udc
_core tcm_qla2xxx qla2xxx scsi_transport_fc ib_srpt iscsi_target_mod tcm_loop vhost_scsi vhost target_core_user target_core_file target_core_iblock target_core_pscsi target_core_mod
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_security ip6table_nat ip
6table_mangle ip6table_raw iptable_security iptable_nat iptable_mangle iptable_raw ebtable_filter ebtables openvswitch ip6table_filter ip6_tables nf_conntrack_ipv6 nf_nat_ipv6 iptabl
e_filter nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack
[11827.613527] binfmt_misc vmx_crypto ipmi_powernv ipmi_devintf ipmi_msghandler leds_powernv uio_pdrv_genirq powernv_rng uio powernv_op_panel nfsd auth_rpcgss nfs_acl lockd grace su
nrpc ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas crc32c_vpmsum tg3 ipr
[11827.613935] CPU: 40 PID: 74758 Comm: CPU 17/KVM Not tainted 4.10.0-19-generic #21-Ubuntu
[11827.614006] task: c000000e3998c600 task.stack: c000000e39900000
[11827.614065] NIP: c00000000030ea08 LR: c00000000030e918 CTR: 0000000000000000
[11827.614135] REGS: c000000e399033b0 TRAP: 0700 Not tainted (4.10.0-19-generic)
[11827.614205] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
[11827.614211] CR: 44882882 XER: 00000000
[11827.614293] CFAR: c00000000030eb44 SOFTE: 1
[11827.614293] GPR00: c00000000030e918 c000000e39903630 c00000000145cb00 f000000001fa4770
[11827.614293] GPR04: c0000007e91d2010 f000000001fa4770 000000001f00025c 000000005c02001f
[11827.614293] GPR08: c0000000015ccb00 0000000000000001 0000000000000001 0000000000201de9
[11827.614293] GPR12: 0000000000002200 c000000007b56800 0000000000000000 000000000007fe03
[11827.614293] GPR16: 0000000000010000 00003ffede020000 0000000000000000 0000000088000000
[11827.614293] GPR20: 0000000020000000 0000000088000000 0000000022000000 c000000f1b320000
[11827.614293] GPR24: c0000000015ce3d8 c000000e397c6e00 c000000f1d3dfb90 0000000000000000
[11827.614293] GPR28: c000000e39903710 3e00000000055c02 f000000001570080 f000000001fa4770
[11827.614897] NIP [c00000000030ea08] __migration_entry_wait+0x128/0x2a0
[11827.614956] LR [c00000000030e918] __migration_entry_wait+0x38/0x2a0
[11827.615015] Call Trace:
[11827.615040] [c0...

Read more...

bugproxy (bugproxy)
tags: added: severity-high
removed: severity-critical
Revision history for this message
bugproxy (bugproxy) wrote : alinefm-avocado-tests-output

------- Comment (attachment only) From <email address hidden> 2017-04-13 09:19 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : alinefm-avocado-tests-4.10.0-19

------- Comment (attachment only) From <email address hidden> 2017-04-19 16:59 EDT-------

Revision history for this message
Seth Forshee (sforshee) wrote :

We've been getting a number of other reports of this problem. I've been trying to reproduce it locally, without any luck so far. However it does appear that the problem happens only in zesty kernels and not with upstream 4.10 stable kernels, which suggests that one of the backports or sauce patches we've applied. The stack trace suggests a problem with migration (or possibly KSM).

Going through those sorts of commits related to the kernel mm code, a few stand out based on the size of the changes and the code they're touching. All of them are backports requested for power9 on bug #1671613.

6e2a092a48d3 mm: introduce page_vma_mapped_walk()
3000e033152a mm, ksm: convert write_protect_page() to use page_vma_mapped_walk()
c228a1037cd6 mm/ksm: handle protnone saved writes when making page write protect

Some upstream bug fixes reference these patches too (but no mention of the BUG we're hitting):

d19469e84158 power/mm: update pte_write and pte_wrprotect to handle savedwrite
d75450ff40df mm: fix page_vma_mapped_walk() for ksm pages

I'm going to try to set up the Avocado tests today to see if that allows me to reproduce. If you are able to reproduce reliably, you could try applying the fixes above to see if they help, or try bisecting the patches applied to zesty on top of upstream 4.10 to identify the patch which causes the issues.

bugproxy (bugproxy)
tags: removed: bugnameltc-152928 severity-high
Revision history for this message
Seth Forshee (sforshee) wrote :

Realized that I made a mistake above. Rather than d19469e84158 I meant this one:

4b0ece6fa016 mm: migrate: fix remove_migration_pte() for ksm pages

Revision history for this message
Dennis Sheil (dennis-sheil) wrote :

The top of the stacktrace is the same as bug #1674838.

Revision history for this message
Seth Forshee (sforshee) wrote :

Please note that we did confirm this was caused by a couple of the Power9 enablement patches from bug #1671613:

 6e2a092a48d3 mm: introduce page_vma_mapped_walk()
 3000e033152a mm, ksm: convert write_protect_page() to use page_vma_mapped_walk()

These seem to only be providing context for the third patch:

 c228a1037cd6 mm/ksm: handle protnone saved writes when making page write protect

Based on the commit messages I don't believe they were intended to provide any functional change. The first claims to only be introducing a new interface, and the second says that write_protect_page() should use the new interface "for consistency."

For that reason I've proposed we revert the patches and replace it with a backport of c228a1037cd6 instead.

https://lists.ubuntu.com/archives/kernel-team/2017-May/083976.html

However 6e2a092a48d3 does contain some changes that seem unrelated to the new interface, could you please advise whether or not these need to be reapplied?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 85742ac5b32e..a7bac4f2b78a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2129,9 +2129,12 @@ static void freeze_page(struct page *page)
 static void unfreeze_page(struct page *page)
 {
        int i;
-
- for (i = 0; i < HPAGE_PMD_NR; i++)
- remove_migration_ptes(page + i, page + i, true);
+ if (PageTransHuge(page)) {
+ remove_migration_ptes(page, page, true);
+ } else {
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ remove_migration_ptes(page + i, page + i, true);
+ }
 }

 static void __split_huge_page_tail(struct page *head, int tail,

bugproxy (bugproxy)
tags: added: bugnameltc-152928 severity-high
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-18 15:39 EDT-------
>
> However 6e2a092a48d3 does contain some changes that seem unrelated to the
> new interface, could you please advise whether or not these need to be
> reapplied?
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 85742ac5b32e..a7bac4f2b78a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2129,9 +2129,12 @@ static void freeze_page(struct page *page)
> static void unfreeze_page(struct page *page)
> {
> int i;
> -
> - for (i = 0; i < HPAGE_PMD_NR; i++)
> - remove_migration_ptes(page + i, page + i, true);
> + if (PageTransHuge(page)) {
> + remove_migration_ptes(page, page, true);
> + } else {
> + for (i = 0; i < HPAGE_PMD_NR; i++)
> + remove_migration_ptes(page + i, page + i, true);
> + }
> }
>
> static void __split_huge_page_tail(struct page *head, int tail,

the above change should have been applied independently of all the other changes in 6e2a092a48d3.
The change avoids calling remove_migration_ptes() on all sub pages, when it knows that only the head page needs to removed from migration. With or without the change the code works. However, it saves some cycles, with the change.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.