oops in 4.4.0-62-generic (ppc64le)

Bug #1714859 reported by Daniel Black on 2017-09-04
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Zesty
Medium
Unassigned
Artful
Medium
Unassigned

Bug Description

Sep 1 14:23:30 p87 kernel: [17274563.423972] device vnet2 entered promiscuous mode
Sep 1 14:23:30 p87 kernel: [17274563.436101] br0: port 4(vnet2) entered forwarding state
Sep 1 14:23:30 p87 kernel: [17274563.436113] br0: port 4(vnet2) entered forwarding state
Sep 1 14:23:31 p87 kernel: [17274564.005034] audit: type=1400 audit(1504239811.140:793): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" pid=18952 comm="apparmor_parser"
Sep 1 14:23:31 p87 kernel: [17274564.019911] audit: type=1400 audit(1504239811.156:794): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5//qemu_bridge_helper" pid=18952 comm="apparmor_parser"
Sep 1 14:23:31 p87 kernel: [17274564.324291] KVM guest htab at c0000079a1000000 (order 29), LPID 3
Sep 1 14:23:45 p87 kernel: [17274578.483572] br0: port 4(vnet2) entered forwarding state
Sep 1 14:26:53 p87 kernel: [17274766.572284] br0: port 4(vnet2) entered disabled state
Sep 1 14:26:53 p87 kernel: [17274766.579930] device vnet2 left promiscuous mode
Sep 1 14:26:53 p87 kernel: [17274766.579932] br0: port 4(vnet2) entered disabled state
Sep 1 14:26:54 p87 kernel: [17274767.303109] audit: type=1400 audit(1504240014.441:795): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" pid=22668 comm="apparmor_parser"
Sep 1 14:26:56 p87 kernel: [17274768.917394] audit: type=1400 audit(1504240016.053:796): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" pid=22672 comm="apparmor_parser"
Sep 1 14:26:56 p87 kernel: [17274768.917673] audit: type=1400 audit(1504240016.053:797): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5//qemu_bridge_helper" pid=22672 comm="apparmor_parser"
Sep 1 14:26:56 p87 kernel: [17274768.982783] device vnet2 entered promiscuous mode
Sep 1 14:26:56 p87 kernel: [17274768.994891] br0: port 4(vnet2) entered forwarding state
Sep 1 14:26:56 p87 kernel: [17274768.994902] br0: port 4(vnet2) entered forwarding state
Sep 1 14:26:56 p87 kernel: [17274769.535982] audit: type=1400 audit(1504240016.673:798): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" pid=22702 comm="apparmor_parser"
Sep 1 14:26:56 p87 kernel: [17274769.546719] audit: type=1400 audit(1504240016.685:799): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5//qemu_bridge_helper" pid=22702 comm="apparmor_parser"
Sep 1 14:26:57 p87 kernel: [17274769.861666] KVM guest htab at c0000079a1000000 (order 29), LPID 3
Sep 1 14:27:11 p87 kernel: [17274784.050382] br0: port 4(vnet2) entered forwarding state
Sep 1 14:28:30 p87 kernel: [17274863.605174] Unable to handle kernel paging request for data at address 0x00000008
Sep 1 14:28:30 p87 kernel: [17274863.605188] Faulting instruction address: 0xc00000000044edcc
Sep 1 14:28:30 p87 kernel: [17274863.605195] Oops: Kernel access of bad area, sig: 11 [#1]
Sep 1 14:28:30 p87 kernel: [17274863.605199] SMP NR_CPUS=2048 NUMA PowerNV
Sep 1 14:28:30 p87 kernel: [17274863.605206] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables binfmt_misc veth xfrm_user xfrm_algo vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter overlay bridge stp llc kvm_hv uio_pdrv_genirq powernv_rng ibmpowernv vmx_crypto leds_powernv uio ipmi_powernv ipmi_msghandler kvm_pr kvm autofs4 xfs btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid10 raid1 ses enclosure mlx4_en be2net lpfc mlx4_core vxlan ip6_udp_tunnel scsi_transport_fc udp_tunnel ipr [last unloaded: ebtables]
Sep 1 14:28:30 p87 kernel: [17274863.605294] CPU: 24 PID: 19018 Comm: sshfs Not tainted 4.4.0-62-generic #83-Ubuntu
Sep 1 14:28:30 p87 kernel: [17274863.605300] task: c000001fc9c7c8e0 ti: c000001fe4c10000 task.ti: c000001fe4c10000
Sep 1 14:28:30 p87 kernel: [17274863.605306] NIP: c00000000044edcc LR: c00000000044ede0 CTR: c000000000108b20
Sep 1 14:28:30 p87 kernel: [17274863.605311] REGS: c000001fe4c139a0 TRAP: 0300 Not tainted (4.4.0-62-generic)
Sep 1 14:28:30 p87 kernel: [17274863.605315] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000424 XER: 00000000
Sep 1 14:28:30 p87 kernel: [17274863.605330] CFAR: c000000000008468 DAR: 0000000000000008 DSISR: 42000000 SOFTE: 1
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR00: c00000000044ede0 c000001fe4c13c20 c0000000015f7600 c000001fda681000
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR04: c00000006377fa00 0000000000000001 0000000000000000 0000000000000000
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR08: 0000000000000000 c00000006377fa00 c000001fe4c13c90 0000000000000c01
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR12: c000000000108b20 c00000000fb4e400 0000000000000000 0000000000000000
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR16: 0000000000000000 0000000000000000 c00000003e213560 0000000000000000
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR20: c000001fe4c13c90 c000001fda6815b0 c000001fe4c13ca0 c000001fda681000
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR24: ffffffffffffff99 0000000000000400 0000000000000000 c000001fda681000
Sep 1 14:28:30 p87 kernel: [17274863.605330] GPR28: ffffffffffffff99 0000000000000080 0000000000000100 c000001fe4c13c90
Sep 1 14:28:30 p87 kernel: [17274863.605393] NIP [c00000000044edcc] end_requests+0x8c/0xd0
Sep 1 14:28:30 p87 kernel: [17274863.605398] LR [c00000000044ede0] end_requests+0xa0/0xd0
Sep 1 14:28:30 p87 kernel: [17274863.605401] Call Trace:
Sep 1 14:28:30 p87 kernel: [17274863.605405] [c000001fe4c13c20] [c00000000044ede0] end_requests+0xa0/0xd0 (unreliable)
Sep 1 14:28:30 p87 kernel: [17274863.605412] [c000001fe4c13c70] [c00000000044fb90] fuse_abort_conn+0x3a0/0x490
Sep 1 14:28:30 p87 kernel: [17274863.605418] [c000001fe4c13d20] [c00000000044fd48] fuse_dev_release+0xc8/0xd0
Sep 1 14:28:30 p87 kernel: [17274863.605426] [c000001fe4c13d50] [c0000000002e65c0] __fput+0xe0/0x310
Sep 1 14:28:30 p87 kernel: [17274863.605433] [c000001fe4c13db0] [c0000000000e3e50] task_work_run+0xf0/0x130
Sep 1 14:28:30 p87 kernel: [17274863.605440] [c000001fe4c13e00] [c0000000000178d4] do_notify_resume+0xc4/0xd0
Sep 1 14:28:30 p87 kernel: [17274863.605446] [c000001fe4c13e30] [c000000000009838] ret_from_except_lite+0x64/0x68
Sep 1 14:28:30 p87 kernel: [17274863.605451] Instruction dump:
Sep 1 14:28:30 p87 kernel: [17274863.605455] 7d08e878 7d0051ad 40c2fff4 60420000 7d0050a8 7d08f078 7d0051ad 40c2fff4
Sep 1 14:28:30 p87 kernel: [17274863.605464] e9490008 e9090000 7d244b78 7f63db78 <f9480008> f90a0000 f9290000 f9290008
Sep 1 14:28:30 p87 kernel: [17274863.605479] ---[ end trace c395103c4016f0f2 ]---
Sep 1 14:28:30 p87 kernel: [17274863.634036]

make -j 800
on a sshfs connected source directory (the source was a KVM guest running on the local machine). the guest (a BE guest) was rather stalled at the time and ended up hard powering off on it. The oops occurred about the same time.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1714859

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
Daniel Black (daniel-black) wrote :
Download full text (3.8 KiB)

While attempting to reproduce:

With make -j 400 running in another shell:

danielgb@p87:~$ virsh suspend jenkins-ppc64
Domain jenkins-ppc64 suspended

wait over 2 minutes.

danielgb@p87:~$ virsh resume jenkins-ppc64
Domain jenkins-ppc64 resumed

danielgb@p87:~$ virsh list
 Id Name State
----------------------------------------------------
 1 jenkins running
 2 jenkins-ppc64 running

[ 5761.783275] INFO: task cmake:42287 blocked for more than 120 seconds.
[ 5761.783370] Tainted: G E 4.4.0-93-generic #116
[ 5761.783382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5761.783448] cmake D 00003fffac68e13c 0 42287 42278 0x00040000
[ 5761.783453] Call Trace:
[ 5761.783459] [c000005fd6c4b750] [0000000000008000] 0x8000 (unreliable)
[ 5761.783467] [c000005fd6c4b920] [c000000000015cb8] __switch_to+0x1f8/0x350
[ 5761.783473] [c000005fd6c4b970] [c000000000b08a1c] __schedule+0x30c/0x990
[ 5761.783476] [c000005fd6c4ba40] [c000000000b090e8] schedule+0x48/0xc0
[ 5761.783480] [c000005fd6c4ba70] [c000000000b09630] schedule_preempt_disabled+0x20/0x30
[ 5761.783484] [c000005fd6c4ba90] [c000000000b0b7fc] __mutex_lock_slowpath+0xec/0x1f0
[ 5761.783487] [c000005fd6c4bb10] [c000000000b0b978] mutex_lock+0x78/0xa0
[ 5761.783493] [c000005fd6c4bb40] [c0000000002f890c] walk_component+0x2fc/0x420
[ 5761.783496] [c000005fd6c4bbb0] [c0000000002f8bec] link_path_walk+0x1bc/0x660
[ 5761.783499] [c000005fd6c4bc40] [c0000000002f962c] path_lookupat+0xdc/0x1d0
[ 5761.783503] [c000005fd6c4bc90] [c0000000002fb9d8] filename_lookup+0xa8/0x1b0
[ 5761.783507] [c000005fd6c4bdb0] [c0000000002e1f70] SyS_faccessat+0xe0/0x2e0
[ 5761.783511] [c000005fd6c4be30] [c000000000009204] system_call+0x38/0xb4
[ 5761.783515] INFO: task cc1plus:42687 blocked for more than 120 seconds.
[ 5761.783579] Tainted: G E 4.4.0-93-generic #116
[ 5761.783641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5761.783715] cc1plus D 00003fffa515dc2c 0 42687 42686 0x00040000
[ 5761.783719] Call Trace:
[ 5761.783722] [c000005fd6c47720] [7265677368657265] 0x7265677368657265 (unreliable)
[ 5761.783726] [c000005fd6c478f0] [c000000000015cb8] __switch_to+0x1f8/0x350
[ 5761.783729] [c000005fd6c47940] [c000000000b08a1c] __schedule+0x30c/0x990
[ 5761.783732] [c000005fd6c47a10] [c000000000b090e8] schedule+0x48/0xc0
[ 5761.783736] [c000005fd6c47a40] [c000000000b09630] schedule_preempt_disabled+0x20/0x30
[ 5761.783739] [c000005fd6c47a60] [c000000000b0b7fc] __mutex_lock_slowpath+0xec/0x1f0
[ 5761.783743] [c000005fd6c47ae0] [c000000000b0b978] mutex_lock+0x78/0xa0
[ 5761.783746] [c000005fd6c47b10] [c0000000002f890c] walk_component+0x2fc/0x420
[ 5761.783749] [c000005fd6c47b80] [c0000000002f8bec] link_path_walk+0x1bc/0x660
[ 5761.783753] [c000005fd6c47c10] [c0000000002faadc] path_openat+0xac/0x3c0
[ 5761.783756] [c000005fd6c47c90] [c0000000002fca1c] do_filp_open+0xfc/0x170
[ 5761.783759] [c000005fd6c47db0] [c0000000002e3210] do_sys_open+0x1c0/0x3b0
[ 5761.783762] [c000005fd6c47e30] [c000000000009204] system_call+0x38/0xb4
[ 5761.78376...

Read more...

Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
status: New → Incomplete
Daniel Black (daniel-black) wrote :
Download full text (24.8 KiB)

I hadn't used sshfs on Power machines before so there isn't a prior good version.

Mainline v4.13 - jenkins-ppc64

steps on host:

virsh start {guest}

sshfs ozlabs@jenkins-ppc64:/home/ozlabs jenkins-ppc64/ -o reconnect,idmap=user

cd jenkins-ppc64/linux

make -j 400 ; # something with a lot of read on sshfs

# on another terminal

virsh suspend jenkins-ppc64

killall -9 cc (or whatever processes is reading)

# resume guest

root@p87:~# virsh resume jenkins-ppc64
Domain jenkins-ppc64 resumed

root@p87:~# virsh list
 Id Name State
----------------------------------------------------
 1 jenkins running
 2 jenkins-ppc64 running

root@p87:~# virsh list
 Id Name State
----------------------------------------------------
 1 jenkins running
 2 jenkins-ppc64 running

root@p87:~# virsh list
 Id Name State
----------------------------------------------------
 1 jenkins running
 2 jenkins-ppc64 running

root@p87:~# virsh list
 Id Name State
----------------------------------------------------
 1 jenkins running
 2 jenkins-ppc64 running

root@p87:~# virsh console jenkins-ppc64
Connected to domain jenkins-ppc64
Escape character is ^]
Shared connection to p87 closed.

No further network connections to p87 could be established.

I happened to have a FSP ipmi sol activate console open:

(large amounts of tg3 0005:09:00.0 enP5p9s0f0: removed - appologies if I missed backtraces from specific CPUs).

root@p87:~# Watchdog CPU:160 detected Hard LOCKUP other CPUS:40
watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [ksmd:1695]
watchdog: BUG: soft lockup - CPU#160 stuck for 22s! [qemu-system-ppc:13057]
INFO: rcu_sched self-detected stall on CPU
        64-...: (2600 ticks this GP) idle=8b2/140000000000001/0 softirq=40596/40596 fqs=1148
         (t=2601 jiffies g=18364 c=18363 q=1057)
INFO: rcu_sched detected stalls on CPUs/tasks:
Watchdog CPU:24 detected Hard LOCKUP other CPUS:160
Watchdog CPU:160 Hard LOCKUP
        40-...: (1 GPs behind) idle=4a2/140000000000000/0 softirq=40283/40285 fqs=1149
        64-...: (2601 ticks this GP) idle=8b2/140000000000002/0 softirq=40596/40596 fqs=1149
Watchdog CPU:8 Hard LOCKUP

Watchdog CPU:8 became unstuck
Watchdog CPU:56 detected Hard LOCKUP other CPUS:112
Watchdog CPU:24 Hard LOCKUP
Watchdog CPU:24 became unstuck
rcu_sched kthread starved for 1074 jiffies! g18364 c18363 f0x0 RCU_GP_DOING_FQS(4) ->state=0x0
Watchdog CPU:112 became unstuck
Watchdog CPU:8 detected Hard LOCKUP other CPUS:64
Watchdog CPU:64 Hard LOCKUP
watchdog: BUG: soft lockup - CPU#64 stuck for 22s! [ksmd:1695]
watchdog: BUG: sof...

Changed in linux (Ubuntu Xenial):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Daniel Black (daniel-black) wrote :

fyi - did more testing - seems to not to affect x86_64 - detailed test should appear in https://lists.ozlabs.org/pipermail/linuxppc-dev soon.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in linux (Ubuntu Xenial):
status: Confirmed → Triaged
Changed in linux (Ubuntu Zesty):
status: New → Triaged
importance: Undecided → Medium

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers