Kernel Oops - unable to handle kernel paging request; RIP is at free_pipe_info+0x57/0x90

Bug #1709626 reported by Erik Hoeschler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
Xenial
Invalid
High
Unassigned

Bug Description

Hi,

I'm running a Docker swarm with 5 Nodes. Each node is getting this kernel oops pretty frequently, i would say once a day. I'm not able to reproduce it effectively but it seem's to happen in case a docker stack consisting of approx. 40 containers gets deployed.

Thanks in advance!

Description: Ubuntu 16.04.3 LTS
Release: 16.04

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-87-generic 4.4.0-87.110
ProcVersionSignature: Ubuntu 4.4.0-87.110-generic 4.4.73
Uname: Linux 4.4.0-87-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Aug 9 10:23 seq
 crw-rw---- 1 root audio 116, 33 Aug 9 10:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
Date: Wed Aug 9 15:31:53 2017
HibernationDevice: RESUME=/dev/mapper/vg00-lv_swap
InstallationDate: Installed on 2016-06-01 (434 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-87-generic root=/dev/mapper/vg00-lv_root ro cgroup_enable=memory swapaccount=1
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-87-generic N/A
 linux-backports-modules-4.4.0-87-generic N/A
 linux-firmware 1.157.11
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/21/2015
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/21/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Erik Hoeschler (erhoe) wrote :
Revision history for this message
Erik Hoeschler (erhoe) wrote :
Download full text (3.7 KiB)

Full trace:

[14900.892616] BUG: unable to handle kernel paging request at 000000010000000d
[14900.892667] IP: [<ffffffff81218467>] free_pipe_info+0x57/0x90
[14900.892703] PGD 480386067 PUD 0
[14900.892724] Oops: 0000 [#1] SMP
[14900.892744] Modules linked in: xt_REDIRECT nf_nat_redirect cfg80211 seqiv iptable_raw xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 drbg ansi_cprng nfsv3 nfs_acl nfs lockd grace fscache ip_vs_rr xt_ipvs ip_vs xt_nat xt_tcpudp veth vxlan ip6_udp_tunnel udp_tunnel iptable_mangle xt_mark mmfs26(OE) mmfslinux(OE) tracedev(OE) ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc vmw_vsock_vmci_transport vsock binfmt_misc ppdev vmw_balloon joydev input_leds serio_raw i2c_piix4 shpchp vmw_vmci 8250_fintek parport_pc parport mac_hid ib_iser rdma_cm sunrpc iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
[14900.893180] libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear vmwgfx ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd syscopyarea sysfillrect sysimgblt mptspi fb_sys_fops mptscsih psmouse mptbase drm e1000 vmxnet3 scsi_transport_spi pata_acpi fjes
[14900.893417] CPU: 0 PID: 23905 Comm: exe Tainted: G OE 4.4.0-87-generic #110-Ubuntu
[14900.893457] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[14900.893507] task: ffff8806110fd400 ti: ffff88046ec60000 task.ti: ffff88046ec60000
[14900.893543] RIP: 0010:[<ffffffff81218467>] [<ffffffff81218467>] free_pipe_info+0x57/0x90
[14900.893583] RSP: 0018:ffff88046ec63df8 EFLAGS: 00010202
[14900.893612] RAX: 00000000fffffffd RBX: 0000000000000008 RCX: 000000000000051f
[14900.893645] RDX: 0000000000000028 RSI: ffff880097ab9d40 RDI: ffff88059027b8c0
[14900.893678] RBP: ffff88046ec63e08 R08: 0000000000000000 R09: 0000000000000000
[14900.893710] R10: ffff8805f041cb70 R11: ffff88008e1a2210 R12: ffff88059027b8c0
[14900.894621] R13: ffff8805f041cb70 R14: ffff8808198c4c20 R15: ffff88050b2d66c0
[14900.895468] FS: 00007f1b2d37b700(0000) GS:ffff88081d600000(0000) knlGS:0000000000000000
[14900.896343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14900.897214] CR2: 000000010000000d CR3: 00000004212ee000 CR4: 00000000000406f0
[14900.898159] Stack:
[14900.899048] ffff8805f041cbf8 ffff88059027b8c0 ffff88046ec63e30 ffffffff812184fc
[14900.899973] ffff88059027b8c0 ffff88008e1a2200 ffff8805f041cb70 ffff88046ec63e58
[14900.900913] ffffffff812185b0 ffff88008e1a2200 0000000000000010 ffff8805f041cb70
[14900.901859] Call Trace:
[14900.902807] [<ffffffff812184fc>] put_pipe_info+0x5c/0x70
[14900.904173] [<ffffffff812185b0>] pipe_release+0xa0/0xb0
[14900.905139] [<ffffffff81211154>] __fput+0xe4/0x220
[14900.906071] [<ffffffff812112ce>] ____fput+0xe/0x10
[14900.907022] [<ffffffff8109f091>] task_work_run+0x81/0xa0
[1...

Read more...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc4

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Erik Hoeschler (erhoe) wrote :

Hey Joseph,

I will test the latest upstream kernel.

I think the problem wasn't there on version: 4.4.0-71-generic. With this kernel we had another problem which was fixed in 4.4.0-87-generic so we switched over to this version.

Will update this report in a few days.

Thanks in advance
Erik

Revision history for this message
Erik Hoeschler (erhoe) wrote :

Hey Joseph,

i setup two new docker Hosts with the upstram Kernel u mentioned in you last post. They are working fine for some days now. Today i tried updating the environment running into the kernel oops but got a compile error on the GPFS Kernel Module.

/usr/lpp/mmfs/src/include/gpl-linux/verdep.h:177:2: error: #error "The host might be using *Ubuntu mainline kernel*, it is not supported by Ubuntu and not tested by GPFS. Please switch back to *Ubuntu distribution kernel*"
 #error "The host might be using *Ubuntu mainline kernel*, it is not supported by Ubuntu and not tested by GPFS. Please switch back to *Ubuntu distribution kernel*"

This means i cannot test the mainline kernel in conjunction with our Cluster Filesystem (GPFS). This means i cannot Flag this Bug with either of your tags mentioned.

What can we do now?

Regards,
Erik

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We should be able to perform a kernel bisect to identify the commit that introduced this regression. You say this bug did not exist in 4.4.0-71, but it does exist in 4.4.0-87. To perform a bisect, we need to identify the last kernel version that did not have the bug and the first that did. Can you test the following kernel:

4.4.0-85: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12845707

Changed in linux (Ubuntu):
importance: Medium → High
status: Incomplete → Confirmed
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
status: New → Confirmed
tags: added: performing-bisect
Revision history for this message
Erik Hoeschler (erhoe) wrote :

Hi,

today we switched to kernel 4.4.0-31-generic. IBM told us to use this kernel in conjunction with GPFS. We'll see if it helps ...

I'll update this post next week.

Regards,
Erik

Revision history for this message
Erik Hoeschler (erhoe) wrote :

Hey Joseph,

after we switched to Kernel 4.4.0-31-generic everything is working fine for now. I think we can close this bugreport because the bug is related to the GPFS Kernel Modules which weren't tested properly on 4.4.0-87-generic.

Thanks for your support!
Regards,
Erik

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Xenial):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.