Ubuntu 18.04 Machine crashed while running ltp.

Bug #1761729 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Invalid
Critical
Canonical Kernel Team
linux (Ubuntu)
Invalid
Critical
Joseph Salisbury
Bionic
Invalid
Critical
Joseph Salisbury

Bug Description

---Problem Description---
Ubuntu 18.04 [ Briggs P8 ]: Machine crashed while running ltp.

---Environment--
Kernel Build: Ubuntu 18.04
System Name : ltc-briggs2
Model/Type : P8
Platform : BML

---Uname output---

root@ltc-briggs2:~# uname -a
Linux ltc-briggs2 4.15.0-13-generic #14-Ubuntu SMP Sat Mar 17 13:43:15 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

---Steps to reproduce--

$ git clone https://github.com/linux-test-project/ltp.git
$ cd ltp
$ make autotools
$ ./configure
$ make
$ make install

ltp
=====

root@ltc-briggs2:~#
root@ltc-briggs2:~# [10781.098337] LTP: starting fs_inod01 (fs_inod $TMPDIR 10 10 10)
[10782.837910] LTP: starting linker01 (linktest.sh 1000 1000)
[10784.504474] LTP: starting openfile01 (openfile -f10 -t10)
[10784.534953] LTP: starting inode01
[10784.550767] LTP: starting inode02
[10784.739104] LTP: starting stream01
[10784.740840] LTP: starting stream02
[10784.742487] LTP: starting stream03
[10784.744532] LTP: starting stream04
[10784.746087] LTP: starting stream05
[10784.747722] LTP: starting ftest01
[10785.142054] LTP: starting ftest02
[10785.158852] LTP: starting ftest03
[10785.404760] LTP: starting ftest04
[10785.527197] LTP: starting ftest05
[10785.937164] LTP: starting ftest06
[10785.958360] LTP: starting ftest07
[10786.463382] LTP: starting ftest08
[10786.592998] LTP: starting lftest01 (lftest 100)
[10786.672707] LTP: starting writetest01 (writetest)
[10786.774292] LTP: starting fs_di (fs_di -d $TMPDIR)
[10792.973510] LTP: starting proc01 (proc01 -m 128)
[10793.865686] ICMPv6: process `proc01' is using deprecated sysctl (syscall) net.ipv6.neigh.default.base_reachable_time - use net.ipv6.neigh.default.base_reachable_time_ms instead
[10795.785593] LTP: starting read_all_dev (read_all -d /dev -e '/dev/watchdog?(0)' -q -r 10)
[10795.895774] NET: Registered protocol family 40
[10795.918763] Bluetooth: Core ver 2.22
[10795.918866] NET: Registered protocol family 31
[10795.918909] Bluetooth: HCI device and connection manager initialized
[10795.918955] Bluetooth: HCI socket layer initialized
[10795.918991] Bluetooth: L2CAP socket layer initialized
[10795.919032] Bluetooth: SCO socket layer initialized
[10798.374850] usercopy: kernel memory exposure attempt detected from 0000000029431ea4 (<kernel text>) (1023 bytes)
[10798.374952] ------------[ cut here ]------------
[10798.374988] kernel BUG at /build/linux-2BXDjB/linux-4.15.0/mm/usercopy.c:72!
[10798.375041] Oops: Exception in kernel mode, sig: 5 [#1]
[10798.375080] LE SMP NR_CPUS=2048 [10871.343999650,5] OPAL: Switch to big-endian OS
NUMA PowerNV
[10798.375117] [10876.190849323,5] OPAL: Switch to little-endian OS
Modules linked in: hci_vhci bluetooth ecdh_generic vhost_vsock cuse vmw_vsock_virtio_transport_common userio vsock uhid vhost_net vhost tap snd_seq snd_seq_device snd_timer snd soundcore binfmt_misc sctp quota_v2 quota_tree nls_iso8859_1 ntfs xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_hv kvm idt_89hpesx vmx_crypto ofpart cmdlinepart ipmi_powernv powernv_flash ipmi_devintf mtd ipmi_msghandler ibmpowernv opal_prd at24 powernv_rng joydev input_leds mac_hid uio_pdrv_genirq uio sch_fq_codel nfsd ib_iser rdma_cm auth_rpcgss iw_cm nfs_acl lockd ib_cm grace iscsi_tcp
[10798.375636] libiscsi_tcp libiscsi sunrpc scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ses enclosure scsi_transport_sas hid_generic usbhid hid ib_core qla2xxx ast i2c_algo_bit ttm mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvme_fc crct10dif_vpmsum nvme_fabrics ahci mlxfw crc32c_vpmsum i40e drm devlink scsi_transport_fc megaraid_sas libahci
[10798.375961] CPU: 87 PID: 4085 Comm: read_all Not tainted 4.15.0-13-generic #14-Ubuntu
[10798.376013] NIP: c0000000003c76f0 LR: c0000000003c76ec CTR: 00000000300378e8
[10798.376068] REGS: c0000076c63aba00 TRAP: 0700 Not tainted (4.15.0-13-generic)
[10798.376120] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002222 XER: 20000000
[10798.376176] CFAR: c00000000018cce4 SOFTE: 1
[10798.376176] GPR00: c0000000003c76ec c0000076c63abc80 c0000000016eaf00 0000000000000064
[10798.376176] GPR04: c000007ffc1cce18 c000007ffc1e4368 9000000000009033 000000000000040f
[10798.376176] GPR08: 0000000000000007 c0000000011c3a74 0000007ffb010000 9000000000001003
[10798.376176] GPR12: 0000000000002200 c000000007a8bd00 0000000000000000 0000000000000000
[10798.376176] GPR16: 0000000000000000 0000000000000000 0000000000000006 00007ffff7a0a018
[10798.376176] GPR20: 000008bb551c8908 000008bb551c88f8 000008bb551c88c8 c0000076c63abe00
[10798.376176] GPR24: 0000000000010000 0000000000000000 00007ffff7a0a018 c0000076c63abe00
[10798.376176] GPR28: c0000000000003ff 0000000000000001 00000000000003ff c000000000000000
[10798.376619] NIP [c0000000003c76f0] __check_object_size+0x140/0x270
[10798.376662] LR [c0000000003c76ec] __check_object_size+0x13c/0x270
[10798.376706] Call Trace:
[10798.376724] [c0000076c63abc80] [c0000000003c76ec] __check_object_size+0x13c/0x270 (unreliable)
[10798.376787] [c0000076c63abd00] [c0000000008268a4] read_mem+0x84/0x220
[10798.376835] [c0000076c63abd70] [c0000000003d109c] __vfs_read+0x3c/0x70
[10798.376880] [c0000076c63abd90] [c0000000003d118c] vfs_read+0xbc/0x1b0
[10798.376925] [c0000076c63abde0] [c0000000003d1788] SyS_read+0x68/0x110
[10798.377012] [c0000076c63abe30] [c00000000000b184] system_call+0x58/0x6c
[10798.377057] Instruction dump:
[10798.377086] 2fbd0000 419e010c 3c82ff8b 3ca2ff94 3884c360 38a5ad68 3c62ff8b 7fc8f378
[10798.377140] 7fe6fb78 3863c370 4bdc55b5 60000000 <0fe00000> 60000000 60000000 60420000
[10798.377195] ---[ end trace 21abd4753a69334c ]---
[10798.445038]
[10798.445135] Sending IPI to other CPUs
[10798.446688] IPI complete
[10798.449081] kexec: waiting for cpu 0 (physical 16) to enter OPAL
[10798.450224] kexec: waiting for cpu 23 (physical 47) to enter OPAL
[10798.451396] kexec: waiting for cpu 54 (physical 94) to enter OPAL
[10800.049202] kexec: Starting switchover sequence.
[ 1.078053] integrity: Unable to open file: /etc/keys/x509_ima.der (-2)
[ 1.078057] integrity: Unable to open file: /etc/keys/x509_evm.der (-2)
[ 1.165219] vio vio: uevent: failed to send synthetic uevent
/dev/nvme0n1p2: recovering journal
/dev/nvme0n1p2: clean, 14017353/122101760 files, 57953106/488376576 blocks
-.mount
sys-kernel-debug.mount
setvtrgb.service
dev-hugepages.mount
dev-mqueue.mount
kmod-static-nodes.service
lvm2-lvmetad.service
systemd-remount-fs.service
systemd-tmpfiles-setup-dev.service
systemd-random-seed.service
lvm2-monitor.service
systemd-udevd.service
systemd-modules-load.service
sys-fs-fuse-connections.mount
sys-kernel-config.mount
systemd-sysctl.service
systemd-networkd.service
swapfile.swap
[ 5.177490] vio vio: uevent: failed to send synthetic uevent
systemd-udev-trigger.service
keyboard-setup.service
systemd-journald.service
[ 5.458352] qla2xxx [0020:01:00.0]-00c6:17: MSI-X: Failed to enable support with 32 vectors, using 10 vectors.
apparmor.service
systemd-journal-flush.service
systemd-tmpfiles-setup.service
systemd-update-utmp.service
[ 6.119284] qla2xxx [0020:01:00.1]-00c6:18: MSI-X: Failed to enable support with 32 vectors, using 10 vectors.
systemd-timesyncd.service
[ 10.052141] megaraid_sas 0001:03:00.0: Init cmd return status SUCCESS for SCSI host 1
systemd-networkd-wait-online.service
iscsid.service
blk-availability.service
[ 10.675964] kdump-tools[2222]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201804050340/dump-incomplete
lvm2-pvscan@8:195.service
lvm2-pvscan@8:179.service
Copying data : [100.0 %] / eta: 0s
[ 55.227083] kdump-tools[2222]: The kernel version is not supported.
[ 55.227300] kdump-tools[2222]: The makedumpfile operation may be incomplete.
[ 55.227471] kdump-tools[2222]: The dumpfile is saved to /var/crash/201804050340/dump-incomplete.
[ 55.227583] kdump-tools[2222]: makedumpfile Completed.
[ 55.230250] kdump-tools[2222]: * kdump-tools: saved vmcore in /var/crash/201804050340
[ 55.311695] kdump-tools[2222]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201804050340/dmesg.201804050340
[ 55.330032] kdump-tools[2222]: The kernel version is not supported.
[ 55.330206] kdump-tools[2222]: The makedumpfile operation may be incomplete.
[ 55.330302] kdump-tools[2222]: The dmesg log is saved to /var/crash/201804050340/dmesg.201804050340.
[ 55.330416] kdump-tools[2222]: makedumpfile Completed.
[ 55.330533] kdump-tools[2222]: * kdump-tools: saved dmesg content in /var/crash/201804050340
[ 55.334722] kdump-tools[2222]: Thu, 05 Apr 2018 03:40:44 -0500
[ 55.338419] kdump-tools[2222]: Rebooting.
[ 55.546343] mlx5_core 0021:01:00.1: mlx5_enter_error_state:121:(pid 2715): start
[ 55.546414] mlx5_core 0021:01:00.1: mlx5_enter_error_state:128:(pid 2715): end
[ 55.942498] mlx5_core 0021:01:00.0: mlx5_enter_error_state:121:(pid 2715): start
[ 55.942631] mlx5_core 0021:01:00.0: mlx5_enter_error_state:128:(pid 2715): end
[ 59.836381] reboot: Restarting system
[10963.485916127,5] OPAL: Reboot request...
  5.31149|Ignoring boot flags, incorrect version 0x0
  5.52090|ISTEP 6. 3
  6.16670|ISTEP 6. 4
  6.16957|ISTEP 6. 5
  8.74865|HWAS|PRESENT> DIMM[03]=00AA00AA00AA00AA
  8.74865|HWAS|PRESENT> Membuf[04]=4444000000000000
  8.74866|HWAS|PRESENT> Proc[05]=C000000000000000
 14.03690|ISTEP 6. 6
 14.11948|ISTEP 6. 7
 16.75478|ISTEP 6. 8
 16.91585|ISTEP 6. 9
 17.47534|ISTEP 6.10
 17.55249|ISTEP 6.11
 19.29629|ISTEP 6.12
 19.29926|ISTEP 6.13
 19.30139|ISTEP 7. 1
 19.51889|ISTEP 7. 2

== Comment: #7 - Vaishnavi Bhat <email address hidden> - 2018-04-06 04:52:31 ==
kernel memory exposure attempt detected and the BUG() is called from the below code snippet:
mm/usercopy.c:72

      KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-13-generic
    DUMPFILE: dump.201804050340 [PARTIAL DUMP]
        CPUS: 160
        DATE: Thu Apr 5 03:39:16 2018
      UPTIME: 00:48:44
LOAD AVERAGE: 2.78, 11.61, 106.19
       TASKS: 1748
    NODENAME: ltc-briggs2
     RELEASE: 4.15.0-13-generic
     VERSION: #14-Ubuntu SMP Sat Mar 17 13:43:15 UTC 2018
     MACHINE: ppc64le (2926 Mhz)
      MEMORY: 512 GB
       PANIC: "kernel BUG at /build/linux-2BXDjB/linux-4.15.0/mm/usercopy.c:72!"
         PID: 4085
     COMMAND: "read_all"
        TASK: c000007659f23f00 [THREAD_INFO: c0000076c63a8000]
         CPU: 87
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 4085 TASK: c000007659f23f00 CPU: 87 COMMAND: "read_all"
 #0 [c0000076c63ab740] crash_kexec at c0000000001e22b0
 #1 [c0000076c63ab780] oops_end at c000000000025888
 #2 [c0000076c63ab800] _exception at c000000000026684
 #3 [c0000076c63ab990] program_check_common at c000000000008da4
 Program Check [700] exception frame:
 R0: c0000000003c76ec R1: c0000076c63abc80 R2: c0000000016eaf00
 R3: 0000000000000064 R4: c000007ffc1cce18 R5: c000007ffc1e4368
 R6: 9000000000009033 R7: 000000000000040f R8: 0000000000000007
 R9: c0000000011c3a74 R10: 0000007ffb010000 R11: 9000000000001003
 R12: 0000000000002200 R13: c000000007a8bd00 R14: 0000000000000000
 R15: 0000000000000000 R16: 0000000000000000 R17: 0000000000000000
 R18: 0000000000000006 R19: 00007ffff7a0a018 R20: 000008bb551c8908
 R21: 000008bb551c88f8 R22: 000008bb551c88c8 R23: c0000076c63abe00
 R24: 0000000000010000 R25: 0000000000000000 R26: 00007ffff7a0a018
 R27: c0000076c63abe00 R28: c0000000000003ff R29: 0000000000000001
 R30: 00000000000003ff R31: c000000000000000
 NIP: c0000000003c76f0 MSR: 9000000000029033 OR3: c00000000018cce4
 CTR: 00000000300378e8 LR: c0000000003c76ec XER: 0000000020000000
 CCR: 0000000028002222 MQ: 0000000000000001 DAR: 0000000000000000
 DSISR: 0000000000000000 Syscall Result: 0000000000000000
 #4 [c0000076c63abc80] __check_object_size at c0000000003c76f0
 [Link Register] [c0000076c63abc80] __check_object_size at c0000000003c76ec (unreliable)
 #5 [c0000076c63abd00] read_mem at c0000000008268a4
 #6 [c0000076c63abd70] __vfs_read at c0000000003d109c
 #7 [c0000076c63abd90] vfs_read at c0000000003d118c
 #8 [c0000076c63abde0] sys_read at c0000000003d1788
 #9 [c0000076c63abe30] system_call at c00000000000b184
 System Call [c01] exception frame:
 R0: 0000000000000003 R1: 00007ffff7a09ae0 R2: 0000753ec21b7f00
 R3: 0000000000000006 R4: 00007ffff7a0a018 R5: 00000000000003ff
 R6: 0000000000004000 R7: 0000753ec21898c4 R8: 900000010000d033
 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
 R12: 0000000000000000 R13: 0000753ec224a8d0
 NIP: 0000753ec2188580 MSR: 900000010000d033 OR3: 0000000000000006
 CTR: 0000000000000000 LR: 000008bb551b5f20 XER: 0000000000000000
 CCR: 0000000042002244 MQ: 0000000000000001 DAR: 0000753ec21affa8
 DSISR: 0000000040000000 Syscall Result: 0000000000000006
crash> dis -s c0000000003c76f0
FILE: /build/linux-2BXDjB/linux-4.15.0/mm/usercopy.c
LINE: 72

static void report_usercopy(const void *ptr, unsigned long len,
                            bool to_user, const char *type)
{
        pr_emerg("kernel memory %s attempt detected %s %p (%s) (%lu bytes)\n",
                to_user ? "exposure" : "overwrite",
                to_user ? "from" : "to", ptr, type ? : "unknown", len);
        /*
         * For greater effect, it would be nice to do do_group_exit(),
         * but BUG() actually hooks all the lock-breaking and per-arch
         * Oops code, so that is used here instead.
         */
        BUG();
}

From the logs, I see that the memory exposure happens after the bluetooth driver is initialized. This might be an issue with the default bluetooth driver provided by the distro.

[10795.918866] NET: Registered protocol family 31
[10795.918909] Bluetooth: HCI device and connection manager initialized
[10795.918955] Bluetooth: HCI socket layer initialized
[10795.918991] Bluetooth: L2CAP socket layer initialized
[10795.919032] Bluetooth: SCO socket layer initialized
[10798.374850] usercopy: kernel memory exposure attempt detected from 0000000029431ea4 (<kernel text>) (1023 bytes)
[10798.374952] ------------[ cut here ]------------
[10798.374988] kernel BUG at /build/linux-2BXDjB/linux-4.15.0/mm/usercopy.c:72!

Revision history for this message
bugproxy (bugproxy) wrote : console log

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-166461 severity-critical targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : messages log

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : syslog

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : nohup.out

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : output of 'log' from the crash prompt.

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → Critical
tags: added: triage-g
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16

Changed in linux (Ubuntu):
importance: Undecided → Critical
status: New → Incomplete
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Bionic):
assignee: Canonical Kernel Team (canonical-kernel-team) → Joseph Salisbury (jsalisbury)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
bugproxy (bugproxy)
tags: added: targetmilestone-inin1804
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-04-09 07:27 EDT-------
(In reply to comment #12)
> Did this issue start happening after an update/upgrade? Was there a prior
> kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer to
> https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16
> kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following tag
> 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16

Issue is not observed with upstream kernel.

analysis=exit
<<<test_output>>>
tst_test.c:718: CONF: no asm/ldt.h header (only for i386 or x86_64)
<<<execution_status>>>
initiation_statu[17434.747234] LsTP: starting cve="ok"-2018-5803 (sctp
_big_chunk)
duration=0 termination_type=exited termination_id=32 corefile=no
cutime=0 cstime=0
<<<test_end>>>
<<<test_start>>>
tag=cve-2018-5803 stime=1523271762
cmdline="sctp_big_chunk"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:987: INFO: Timeout per run is 0h 05m 00s
sctp_big_chunk.c:53: INFO: sctp server listen on 52080
sctp_big_chunk.c:68: INFO: bind 3273 additional IP addresses
sctp_big_chunk.c:108: PASS: test doesn't cause crash

Summary:
passed 1
failed 0
skipped 0
warnings 0
<<<execution_status>>>
initiation_status="ok"[17435.077813] L
TP: starting cveduration=1 termi-2018-5803_2 (scntp_big_chunk -a ation_type=exite10000)
d termination_id=0 corefile=no
cutime=0 cstime=32
<<<test_end>>>
<<<test_start>>>
tag=cve-2018-5803_2 stime=1523271763
cmdline="sctp_big_chunk -a 10000"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:987: INFO: Timeout per run is 0h 05m 00s
sctp_big_chunk.c:53: INFO: sctp server listen on 42982
sctp_big_chunk.c:68: INFO: bind 10000 additional IP addresses
sctp_big_chunk.c:108: PASS: test doesn't cause crash

Summary:
passed 1
failed 0
skipped 0
warnings 0
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=3 termination_type=exited termination_id=0 corefile=no
cutime=0 cstime=306
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20180118-179-g99e4ceee8

###############################################################

Done executing testcases.
LTP Version: 20180118-179-g99e4ceee8
###############################################################

Thanks,
Pavithra

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Changed in ubuntu-power-systems:
status: Triaged → Confirmed
Manoj Iyer (manjo)
Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a "Reverse" bisect to figure out what commit fixes this bug. We need to identify the last kernel version that had the bug, and the first kernel version that fixed the bug.

Can you test the following kernels and report back:

4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/
4.16-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc1/
v.16-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4/

You don't have to test every kernel, just up until the first kernel that does not have the bug.

Thanks in advance!

Revision history for this message
Manoj Iyer (manjo) wrote :

We have two Briggs system at Canonical, one is offsite and the other is being used actively by the security cert team. IBM, could you please help testing the kernel Joe noted in the previous comment?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-11 08:10 EDT-------
(In reply to comment #14)
> I'd like to perform a "Reverse" bisect to figure out what commit fixes this
> bug. We need to identify the last kernel version that had the bug, and the
> first kernel version that fixed the bug.
>
> Can you test the following kernels and report back:
>
> 4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/
> 4.16-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc1/
> v.16-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4/
>
> You don't have to test every kernel, just up until the first kernel that
> does not have the bug.
>
> Thanks in advance!

Issue is observed both on '4.15 Final' and 4.16-rc1. Yet to test 4.16-rc4.

Attaching logs.

Thanks,
Pavithra

Revision history for this message
bugproxy (bugproxy) wrote : failure console log

------- Comment (attachment only) From <email address hidden> 2018-04-11 08:11 EDT-------

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

If this issue still also exists on v4.16-rc4, we would also want to test some of the newer release candidates, such as -rc5, -rc6, etc.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (5.4 KiB)

------- Comment From <email address hidden> 2018-04-12 04:19 EDT-------
(In reply to comment #18)
> If this issue still also exists on v4.16-rc4, we would also want to test
> some of the newer release candidates, such as -rc5, -rc6, etc.

Issue is observed even with -rc4.

[10700.354365] LTP: starting ftest07
[10700.738992] LTP: starting ftest08
[10700.829909] LTP: starting lftest01 (lftest 100)
[10700.911841] LTP: starting writetest01 (writetest)
[10701.014047] LTP: starting fs_di (fs_di -d $TMPDIR)
[10707.090259] LTP: starting proc01 (proc01 -m 128)
[10707.947509] ICMPv6: process `proc01' is using deprecated sysctl (syscall) net.ipv6.neigh.default.base_reachable_time - use net.ipv6.neigh.default.base_reachable_time_ms instead
[10709.666103] LTP: starting read_all_dev (read_all -d /dev -e '/dev/watchdog?(0)' -q -r 10)
[10709.749772] NET: Registered protocol family 40
[10709.784921] Bluetooth: Core ver 2.22
[10709.785013] NET: Registered protocol family 31
[10709.785049] Bluetooth: HCI device and connection manager initialized
[10709.785092] Bluetooth: HCI socket layer initialized
[10709.785128] Bluetooth: L2CAP socket layer initialized
[10709.785166] Bluetooth: SCO socket layer initialized
[11009.804641] LTP: starting read_all_proc (read_all -d /proc -q -r 10)
[11010.009001] ICMPv6: process `read_all' is using deprecated sysctl (syscall) net.ipv6.neigh.enP2p1s0f0.base_reachable_time - use net.ipv6.neigh.enP2p1s0f0.base_reachable_time_ms instead
[11011.814394] LTP: starting read_all_sys (read_all -d /sys -q -r 10)
[11012.513676] Unable to handle kernel paging request for data at address 0x000000c8
[11012.513775] Faulting instruction address: 0xd00000001533dcb8
[11012.513824] Oops: Kernel access of bad area, sig: 11 [#1]
[11012.513841] Unable to handle kernel paging request for data at address 0x000000c8
[11012.513864] LE
[11012.513926] Faulting instruction address: 0xd00000001533dcb8
[11012.513928] SMP NR_CPUS=2048 NUMA PowerNV
[11012.514025] Modules linked in: snd_seq_dummy hci_vhci bluetooth cuse userio ecdh_generic uhid vhost_vsock vmw_vsock_virtio_transport_common vhost_net vhost tap vsock snd_seq snd_seq_device snd_timer snd soundcore binfmt_misc sctp quota_v2 quota_tree nls_iso8859_1 ntfs xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter idt_89hpesx vmx_crypto ipmi_powernv ofpart cmdlinepart ipmi_devintf powernv_flash ipmi_msghandler mtd at24 ibmpowernv opal_prd powernv_rng input_leds joydev mac_hid uio_pdrv_genirq uio kvm_hv kvm sch_fq_codel nfsd ib_iser rdma_cm iw_cm auth_rpcgss ib_cm nfs_acl
[11012.514561] lockd iscsi_tcp libiscsi_tcp grace libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib hid_generic ses enclosure scsi_transport_sas usbhid hid ib_core ast i2c_algo_bit ttm drm_kms_helper qla2xxx syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_cor...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Confirmed → Incomplete
Changed in linux (Ubuntu Bionic):
status: Confirmed → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-17 05:15 EDT-------
(In reply to comment #20)
> Can you next test -rc5 and -rc7:
>
> v.16-rc5: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc5/
> v.16-rc7: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc7/

Issue is observed with v.16-rc5

ation_type=exited termination_id=0 corefile=no
cutime=178 cstime=728
<<<test_end>>>
<<<test_start>>>
tag=read_all_sys stime=1523953347
cmdline="read_all -d /sys -q -r 10"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:987: INFO: Timeout per run is 0h 05m 00s
[10613.713802] Unable to handle kernel paging request for data at address 0x000000c8
[10613.713882] Faulting instruction address: 0xd000000016c5def8
[10613.713923] Oops: Kernel acc[10919.189083579,5] OPAL: Switch to big-endian OS
ess of bad ar[10924.025833843,5] OPAL: Switch to little-endian OS

Thanks,
Pavithra

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-17 09:47 EDT-------
Note, the logs are showing a kernel assert (BUG), not a crash. The assert is that something is wrong with the address/length being used to copy data in/out of userspace. We need to look closer at what is being done by the test suite.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-17 13:13 EDT-------
The logs show some "read_all" process - presumably part of the test suite - that is trying to read kernel text memory (via /dev/mem) and triggering the BUG. Not clear if the attempt should have been failed gracefully or not. I'd assume the expectation is that a illegal read from /dev/mem should return an error rather than panic.

Comment #21 shows a different panic, and since no logs or vmcore are provided I can't tell any more about it. If this latest panic is the only reason the bug is open, we need to move that to a new bugzilla as it is not the same issue.

------- Comment From <email address hidden> 2018-04-17 13:15 EDT-------
Comment #19 also shows a different issue than originally reported. So, it seems this bugzilla has deviated from the original problem.

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (5.2 KiB)

------- Comment From <email address hidden> 2018-04-19 04:19 EDT-------
Crash is observed even on rc7 kernel.

[10423.094236] LTP: starting ftest07
[10423.483052] LTP: starting ftest08
[10423.580683] LTP: starting lftest01 (lftest 100)
[10423.659185] LTP: starting writetest01 (writetest)
[10423.761643] LTP: starting fs_di (fs_di -d $TMPDIR)
[10429.845325] LTP: starting proc01 (proc01 -m 128)
[10430.745829] ICMPv6: process `proc01' is using deprecated sysctl (syscall) net.ipv6.neigh.default.base_reachable_time - use net.ipv6.neigh.default.base_reachable_time_ms instead
[10432.423033] LTP: starting read_all_dev (read_all -d /dev -e '/dev/watchdog?(0)' -q -r 10)
[10432.504812] NET: Registered protocol family 40
[10432.530266] Bluetooth: Core ver 2.22
[10432.530367] NET: Registered protocol family 31
[10432.530406] Bluetooth: HCI device and connection manager initialized
[10432.530448] Bluetooth: HCI socket layer initialized
[10432.530482] Bluetooth: L2CAP socket layer initialized
[10432.530522] Bluetooth: SCO socket layer initialized
[10732.410904] LTP: starting read_all_proc (read_all -d /proc -q -r 10)
[10732.940975] ICMPv6: process `read_all' is using deprecated sysctl (syscall) net.ipv6.neigh.default.retrans_time - use net.ipv6.neigh.default.retrans_time_ms instead
[10734.936296] LTP: starting read_all_sys (read_all -d /sys -q -r 10)
[10735.449465] Unable to handle kernel paging request for data at address 0x000000c8
[10735.449563] Faulting instruction address: 0xd00000001588df98
[10735.449606] Oops: Kernel access of bad area, sig: 11 [#1]
[10735.449642] LE SMP NR_CPUS=2048 NUMA PowerNV
[10735.449677] Modules linked in: snd_seq_dummy hci_vhci bluetooth cuse ecdh_generic userio uhid vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost tap snd_seq snd_seq_device snd_timer snd soundcore binfmt_misc sctp quota_v2 quota_tree nls_iso8859_1 ntfs xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter idt_89hpesx vmx_crypto ofpart ipmi_powernv cmdlinepart ipmi_devintf powernv_flash ipmi_msghandler ibmpowernv mtd at24 powernv_rng opal_prd uio_pdrv_genirq uio joydev input_leds mac_hid kvm_hv kvm sch_fq_codel nfsd ib_iser rdma_cm iw_cm auth_rpcgss ib_cm nfs_acl
[10735.450121] iscsi_tcp libiscsi_tcp lockd libiscsi grace scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas mlx5_ib hid_generic usbhid hid ib_core ast i2c_algo_bit ttm qla2xxx drm_kms_helper syscopyarea mlx5_core sysfillrect sysimgblt fb_sys_fops drm nvme_fc nvme_fabrics mlxfw ahci crct10dif_vpmsum crc32c_vpmsum i40e devlink scsi_transport_fc megaraid_sas libahci drm_panel_orientation_quirks
[10735.450441] CPU: 31 PID: 3798 Comm: read_all Tainted: G W 4.16.0-041600rc7-generic #201803252030
[10735.450504] NIP: d00000001588df98 LR: c000000000414fb8 CTR: d00000001588dee8
[1...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-19 06:51 EDT-------
There are two different panics being shown here. One is the kernel assert in usercopy.c, the other is the crash in qla2xxx. You should not be using one bug to handle two different issues. If the kernel assert is no longer happening, then close this bug.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-24 02:38 EDT-------
(In reply to comment #26)
> There are two different panics being shown here. One is the kernel assert in
> usercopy.c, the other is the crash in qla2xxx. You should not be using one
> bug to handle two different issues. If the kernel assert is no longer
> happening, then close this bug.

@Pavithra: can you validate and see no longer you observe assert issue and as per dev. suggestion you can open a new defect on crash issue.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-30 15:20 EDT-------
I tried to reproduce this on my machine and I was not able to. I am using kernel 4.15.0-20-generic and I run:

# ltp-install pwd
/home/breno/ltp-install

# ltp-install sudo ./runltp

Is there any other way to reproduce it?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-02 01:02 EDT-------
(In reply to comment #28)
> I tried to reproduce this on my machine and I was not able to. I am using
> kernel 4.15.0-20-generic and I run:
>
> # ltp-install pwd
> /home/breno/ltp-install
>
> # ltp-install sudo ./runltp
>
> Is there any other way to reproduce it?

Issue is not observed on 4.15.0-20-generic kernel. We can close the bug.

[1]+ Exit 1 nohup ./runltp
root@ltc-briggs1:/opt/ltp# uname -a
Linux ltc-briggs1 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-briggs1:/opt/ltp# tail -n 30 nohup.out
tag=cve-2018-5803_2 stime=1525093814
cmdline="sctp_big_chunk -a 10000"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
sctp_big_chunk.c:53: INFO: sctp server listen on 38843
sctp_big_chunk.c:68: INFO: bind 10000 additional IP addresses
sctp_big_chunk.c:108: PASS: test doesn't cause crash

Summary:
passed 1
failed 0
skipped 0
warnings 0
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=5 termination_type=exited termination_id=0 corefile=no
cutime=0 cstime=520
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20180118-267-g4fbb02a1d

###############################################################

Done executing testcases.
LTP Version: 20180118-267-g4fbb02a1d
###############################################################

Revision history for this message
Frank Heimes (fheimes) wrote :

closing this bug (setting it to Invalid) according to comment #23

Frank Heimes (fheimes)
Changed in linux (Ubuntu Bionic):
status: Incomplete → Invalid
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in ubuntu-power-systems:
status: Incomplete → Invalid
tags: removed: triage-g
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.