Ubuntu
linux package

Server reboots every 4.1 weeks

Xenial (16.04)
Bug #1653498

Bug #1653498 reported by emmecerre on 2017-01-02

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Confirmed	High	Unassigned
	Xenial	Confirmed	High	Unassigned

Bug Description

Every 4.1 uptime weeks (more or less), our 34 servers reboots with the logs below.

Description: Ubuntu 16.04.1 LTS
Release: 16.04

The servers hosts the stack flanneld (0.5.5) docker (1.11.2, build b9f10c9) kubernetes (v1.3.6) plus etcd (2.3.7)

Jan 02 06:40:32 prd-node021 kernel: ------------[ cut here ]------------
Jan 02 06:40:32 prd-node021 kernel: kernel BUG at /build/linux-xHzv4a/linux-4.4.0/include/linux/fs.h:2569!
Jan 02 06:40:32 prd-node021 kernel: invalid opcode: 0000 [#1] SMP
Jan 02 06:40:32 prd-node021 kernel: Modules linked in: nf_conntrack_netlink nfnetlink veth xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp tcp_diag inet_diag
Jan 02 06:40:32 prd-node021 kernel: raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc
Jan 02 06:40:32 prd-node021 kernel: CPU: 46 PID: 22749 Comm: iptables-restor Not tainted 4.4.0-47-generic #68-Ubuntu
Jan 02 06:40:32 prd-node021 kernel: Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/13/2016
Jan 02 06:40:32 prd-node021 kernel: task: ffff882f7ccb44c0 ti: ffff882fcb810000 task.ti: ffff882fcb810000
Jan 02 06:40:32 prd-node021 kernel: RIP: 0010:[<ffffffff8120f9ed>] [<ffffffff8120f9ed>] __fput+0x21d/0x220
Jan 02 06:40:32 prd-node021 kernel: RSP: 0018:ffff882fcb813e68 EFLAGS: 00010246
Jan 02 06:40:32 prd-node021 kernel: RAX: 0000000000000000 RBX: ffff8829dbd92700 RCX: 00000000308c6396
Jan 02 06:40:32 prd-node021 kernel: RDX: 0000000000000001 RSI: ffff88301f299f60 RDI: 0000000000000000
Jan 02 06:40:32 prd-node021 kernel: RBP: ffff882fcb813ea0 R08: 0000000000019f60 R09: ffffffff811b3a1d
Jan 02 06:40:32 prd-node021 kernel: R10: ffffea0064756680 R11: ffff8829dbd92710 R12: 0000000000000010
Jan 02 06:40:32 prd-node021 kernel: R13: ffff8817d3520518 R14: ffff881021a34da0 R15: ffff8817d3542540
Jan 02 06:40:32 prd-node021 kernel: FS: 00007fcdaa753700(0000) GS:ffff88301f280000(0000) knlGS:0000000000000000
Jan 02 06:40:32 prd-node021 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 02 06:40:32 prd-node021 kernel: CR2: 00007fcdaa758000 CR3: 0000001917993000 CR4: 00000000003406e0
Jan 02 06:40:32 prd-node021 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 02 06:40:32 prd-node021 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 02 06:40:32 prd-node021 kernel: Stack:
Jan 02 06:40:32 prd-node021 kernel: ffff8817d3520518 ffff8829dbd92710 ffff882f7ccb44c0 ffffffff82103a30
Jan 02 06:40:32 prd-node021 kernel: ffff8829dbd92700 0000000000000000 ffff882f7ccb4b38 ffff882fcb813eb0
Jan 02 06:40:32 prd-node021 kernel: ffffffff8120fa2e ffff882fcb813ef0 ffffffff8109ee01 ffff882f7ccb4b6c
Jan 02 06:40:32 prd-node021 kernel: Call Trace:
Jan 02 06:40:32 prd-node021 kernel: [<ffffffff8120fa2e>] ____fput+0xe/0x10
Jan 02 06:40:32 prd-node021 kernel: [<ffffffff8109ee01>] task_work_run+0x81/0xa0
Jan 02 06:40:32 prd-node021 kernel: [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0
Jan 02 06:40:32 prd-node021 kernel: [<ffffffff81003c6e>] syscall_return_slowpath+0x4e/0x60
Jan 02 06:40:32 prd-node021 kernel: [<ffffffff81835150>] int_ret_from_sys_call+0x25/0x8f
Jan 02 06:40:32 prd-node021 kernel: Code: 0f 84 cf fe ff ff 48 8b 43 28 48 8b 80 80 00 00 00 48 85 c0 0f 84 bb fe ff ff 31 d2 48 89 de bf ff ff ff ff ff d0 e9 aa fe ff ff <0f>
Jan 02 06:40:32 prd-node021 kernel: RIP [<ffffffff8120f9ed>] __fput+0x21d/0x220
-- Reboot --
Jan 02 06:42:56 prd-node021 systemd-journald[819]: Runtime journal (/run/log/journal/) is 8.0M, max 1.8G, 1.8G free.
Jan 02 06:42:56 prd-node021 kernel: Initializing cgroup subsys cpuset

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-47-generic 4.4.0-47.68
ProcVersionSignature: Ubuntu 4.4.0-47.68-generic 4.4.24
Uname: Linux 4.4.0-47-generic x86_64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jan 2 06:42 seq
crw-rw---- 1 root audio 116, 33 Jan 2 06:42 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Jan 2 10:49:15 2017
HibernationDevice: RESUME=/dev/mapper/vg00-swap
InstallationDate: Installed on 2016-09-13 (110 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Beta amd64 (20160803)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant DL360 Gen9
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-47-generic root=/dev/mapper/vg00-root ro cgroup_enable=memory swapaccount=1
RelatedPackageVersions:
linux-restricted-modules-4.4.0-47-generic N/A
linux-backports-modules-4.4.0-47-generic N/A
linux-firmware 1.157.5
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/13/2016
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.board.name: ProLiant DL360 Gen9
dmi.board.vendor: HP
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP89:bd09/13/2016:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL360 Gen9
dmi.sys.vendor: HP

Tags:

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-01-02:

CRDA.txt Edit (422 bytes, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (441.4 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (2.7 KiB, text/plain; charset="utf-8")
JournalErrors.txt Edit (137.2 KiB, text/plain; charset="utf-8")
Lspci.txt Edit (241.7 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (471 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (62.5 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (98 bytes, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (95.6 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (5.3 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (308.1 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (527.0 KiB, text/plain; charset="utf-8")

Revision history for this message

Brad Figg (brad-figg) wrote on 2017-01-02: Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

Colin Ian King (colin-king) wrote on 2017-01-02:

A possible interesting data point is that /proc/interrupts CAL interrupts (Function call interrupts) are reaching a 32 bit wrap around.

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-01-03:

I thought about this, but I've no idea which buffer fills

Joseph Salisbury (jsalisbury) on 2017-01-05

Changed in linux (Ubuntu):
importance:	Undecided → High
Changed in linux (Ubuntu Xenial):
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-05:

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc2

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-01-10:

Hi Joseph Salisbury
I installed the upstream kernel but this kernel AUFS miss the support that's needed by our stack.
I also tried to compile the kernel with the upstream AUFS but the online AUFS is too old and patches do not apply correctly and the process goes in error.

I dunno if make sense to try some older kernel v4.8 or v4.9 (if the compile works... )

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-01-10:

In the meantime, I compiled the upstream 4.9.2
kernel + ubuntu patches + aufs
Let me know if make sense the test with this kernel version

Revision history for this message

From-Nibly (thatoneemail) wrote on 2017-02-03:

What is the latest on this. We are running into a similar problem with kubernetes only the most recent occurrence was in 13 days. Did upgrading the kernel to 4.9.2 work?

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-02-07:

We installed the 4.9.2 kernel a few days ago. We are still waiting for crash :)
For now, we have more or less 3.1 weeks of uptime.
I'll keep you posted.

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-02-27:

#10

The server with kernel 4.9.2 reboots after 2.874 weeks.

Revision history for this message

emmecerre (manuelcarlo-ranieri) wrote on 2017-02-27:

#11

kernel-bug-exists-upstream

tags:

added: kernel-bug-exists-upstream

emmecerre (manuelcarlo-ranieri) on 2017-04-25

tags:

added: confirmed

Revision history for this message

Andrii (angr) wrote on 2019-06-11:

#12

Download full text (3.5 KiB)

Hello,

We have the same issue on our k8s cluster.

Description: Ubuntu 16.04.3 LTS
Release: 16.04

Docker: 17.03.2-ce
Kubernetes: v1.10.11
Kernel: 4.4.0-145-generic, 4.4.0-109-generic

[5493817.706210] ------------[ cut here ]------------
[5493817.707602] kernel BUG at /build/linux-6VmqmP/linux-4.4.0/include/linux/fs.h:2585!
[5493817.708284] invalid opcode: 0000 [#1] SMP
[5493817.708935] Modules linked in: xt_set xt_multiport iptable_raw iptable_mangle ip_set_hash_net veth nf_conntrack_netlink xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set nfnetlink ip_vs ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs vmw_balloon vmw_vsock_vmci_transport vsock joydev input_leds serio_raw shpchp i2c_piix4 vmw_vmci mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq
[5493817.714295] async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel vmwgfx aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ttm psmouse drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3 drm vmw_pvscsi ahci libahci pata_acpi fjes
[5493817.716942] CPU: 5 PID: 13487 Comm: iptables-save Not tainted 4.4.0-145-generic #171-Ubuntu
[5493817.717597] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[5493817.718885] task: ffff8800bb16d400 ti: ffff88045f38c000 task.ti: ffff88045f38c000
[5493817.719627] RIP: 0010:[<ffffffff8121df13>] [<ffffffff8121df13>] __fput+0x223/0x230
[5493817.720314] RSP: 0018:ffff88045f38fe78 EFLAGS: 00010246
[5493817.721030] RAX: 0000000000000000 RBX: ffff88005f20e200 RCX: 00000000b86bfbc9
[5493817.721714] RDX: 0000000000000001 RSI: ffffe8ffffd51350 RDI: 0000000000000000
[5493817.722383] RBP: ffff88045f38feb0 R08: 000060f7c0011350 R09: ffffffff811beded
[5493817.723038] R10: ffffea000a9f3f00 R11: ffff88005f20e210 R12: 0000000000000010
[5493817.723695] R13: ffff880816bd11a8 R14: ffff8808130a1520 R15: ffff880816aaed80
[5493817.724371] FS: 00007f6087f8b700(0000) GS:ffff88083fd40000(0000) knlGS:0000000000000000
[5493817.725147] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5493817.725835] CR2: 00007f6085f6d1f0 CR3: 00000004621e2000 CR4: 0000000000060670
[5493817.726518] Stack:
[5493817.727156] ffff880816bd11a8 ffff88005f20e210 ffffffff82113f30 ffff8800bb16daa8
[5493817.727806] ffff8800bb16d400 0000000000000000 ffff8800bb16d400 ffff88045f38fec0
[5493817.728508] ffffffff8121df5e ffff88045f38fef0 ffffffff810a4895 0000000000000002
[5493817.729138] Call Trace:
[5493817.729845] [<ffffffff8121df5e>] ____fput+0xe/0x10
[5493817.730511] [<ffffffff810a4895>] task_work_run+0x95/0xb0
[5493817.731222] [<ffffffff81003532>] exit_to_usermode_loop+0xc2/0xd0
[5493817.731875] [<ffffffff81003c7e>...

Hello,

We have the same issue on our k8s cluster.

Description:    Ubuntu 16.04.3 LTS
Release:        16.04

Docker: 17.03.2-ce
Kubernetes: v1.10.11
Kernel: 4.4.0-145-generic, 4.4.0-109-generic

[5493817.706210] ------------[ cut here ]------------
[5493817.707602] kernel BUG at /build/linux-6VmqmP/linux-4.4.0/include/linux/fs.h:2585!
[5493817.708284] invalid opcode: 0000 [#1] SMP
[5493817.708935] Modules linked in: xt_set xt_multiport iptable_raw iptable_mangle ip_set_hash_net veth nf_conntrack_netlink xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set nfnetlink ip_vs ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs vmw_balloon vmw_vsock_vmci_transport vsock joydev input_leds serio_raw shpchp i2c_piix4 vmw_vmci mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq
[5493817.714295]  async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel vmwgfx aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ttm psmouse drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3 drm vmw_pvscsi ahci libahci pata_acpi fjes
[5493817.716942] CPU: 5 PID: 13487 Comm: iptables-save Not tainted 4.4.0-145-generic #171-Ubuntu
[5493817.717597] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[5493817.718885] task: ffff8800bb16d400 ti: ffff88045f38c000 task.ti: ffff88045f38c000
[5493817.719627] RIP: 0010:[<ffffffff8121df13>]  [<ffffffff8121df13>] __fput+0x223/0x230
[5493817.720314] RSP: 0018:ffff88045f38fe78  EFLAGS: 00010246
[5493817.721030] RAX: 0000000000000000 RBX: ffff88005f20e200 RCX: 00000000b86bfbc9
[5493817.721714] RDX: 0000000000000001 RSI: ffffe8ffffd51350 RDI: 0000000000000000
[5493817.722383] RBP: ffff88045f38feb0 R08: 000060f7c0011350 R09: ffffffff811beded
[5493817.723038] R10: ffffea000a9f3f00 R11: ffff88005f20e210 R12: 0000000000000010
[5493817.723695] R13: ffff880816bd11a8 R14: ffff8808130a1520 R15: ffff880816aaed80
[5493817.724371] FS:  00007f6087f8b700(0000) GS:ffff88083fd40000(0000) knlGS:0000000000000000
[5493817.725147] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5493817.725835] CR2: 00007f6085f6d1f0 CR3: 00000004621e2000 CR4: 0000000000060670
[5493817.726518] Stack:
[5493817.727156]  ffff880816bd11a8 ffff88005f20e210 ffffffff82113f30 ffff8800bb16daa8
[5493817.727806]  ffff8800bb16d400 0000000000000000 ffff8800bb16d400 ffff88045f38fec0
[5493817.728508]  ffffffff8121df5e ffff88045f38fef0 ffffffff810a4895 0000000000000002
[5493817.729138] Call Trace:
[5493817.729845]  [<ffffffff8121df5e>] ____fput+0xe/0x10
[5493817.730511]  [<ffffffff810a4895>] task_work_run+0x95/0xb0
[5493817.731222]  [<ffffffff81003532>] exit_to_usermode_loop+0xc2/0xd0
[5493817.731875]  [<ffffffff81003c7e>] syscall_return_slowpath+0x4e/0x60
[5493817.732579]  [<ffffffff818629d3>] int_ret_from_sys_call+0x25/0xa3
[5493817.733214] Code: fe ff ff 48 8b 43 28 48 8b 80 80 00 00 00 48 85 c0 0f 84 b8 fe ff ff 31 d2 48 89 de bf ff ff ff ff e8 32 a5 64 00 e9 a4 fe ff ff <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 31 ff 48
[5493817.735013] RIP  [<ffffffff8121df13>] __fput+0x223/0x230
[5493817.735582]  RSP <ffff88045f38fe78>

Revision history for this message

Andrii (angr) wrote on 2019-06-11:

#13

The similar problem is described on github: https://github.com/kubernetes/kubernetes/issues/70229

Brad Figg (brad-figg) on 2019-07-24

tags:

added: ubuntu-certified

Revision history for this message

chudihuang (chudihuang) wrote on 2020-07-01:

#17

Download full text (7.4 KiB)

Hi,

We have the same issue on our k8s cluster.

Description: ubuntu16.04.1 LTSx86_64
Release: 16.04.1

Kernel: 4.4.0-104-generic

the dump file can be downloaded via following way:
wget http://129.226.115.161/dump.202006231820.tar.gz

I did some analysis, however i still didnot find the root cause:

Load the vmcore in crash (please refer to the hyperlink above). Crash should present details similar to the following:
crash> bt
PID: 11388 TASK: ffff880eb1f79e00 CPU: 29 COMMAND: "heartbeat"
#0 [ffff8809131a7b08] machine_kexec at ffffffff8105c22b
#1 [ffff8809131a7b68] crash_kexec at ffffffff8110e852
#2 [ffff8809131a7c38] oops_end at ffffffff81031c49
#3 [ffff8809131a7c60] die at ffffffff810320fb
#4 [ffff8809131a7c90] do_trap at ffffffff8102f121
#5 [ffff8809131a7ce0] do_error_trap at ffffffff8102f4a9
#6 [ffff8809131a7da0] do_invalid_op at ffffffff8102fa10
#7 [ffff8809131a7db0] invalid_op at ffffffff8184638e
    [exception RIP: __fput+541]
    RIP: ffffffff812126ad RSP: ffff8809131a7e68 RFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff880ef6915700 RCX: 0000000365fb1705
    RDX: 0000000000000001 RSI: ffff880fff55a020 RDI: 0000000000000000
    RBP: ffff8809131a7ea0 R8: 000000000001a020 R9: ffffffff811b591d
    R10: ffffea002b69b300 R11: ffff880ef6915710 R12: 0000000000000010
    R13: ffff880ed152aef8 R14: ffff8800bba18aa0 R15: ffff880ed1513a40
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff8809131a7e60] __fput at ffffffff812125ac
#9 [ffff8809131a7ea8] ____fput at ffffffff812126ee
#10 [ffff8809131a7eb8] task_work_run at ffffffff8109f101
#11 [ffff8809131a7ef8] exit_to_usermode_loop at ffffffff81003242
#12 [ffff8809131a7f30] syscall_return_slowpath at ffffffff81003c6e
#13 [ffff8809131a7f50] int_ret_from_sys_call at ffffffff818449d0
    RIP: 000000000047f704 RSP: 000000c423b77c98 RFLAGS: 00000246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000047f704
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000ca
    RBP: 000000c423b77ce0 R8: 0000000000000000 R9: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 000000c423b78ee0 R15: 0000000000000008
    ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
crash>

Hi,

We have the same issue on our k8s cluster.

Description: ubuntu16.04.1 LTSx86_64
Release: 16.04.1

Kernel: 4.4.0-104-generic

the dump file can be downloaded via following way:
wget http://129.226.115.161/dump.202006231820.tar.gz

I did some analysis, however i still didnot find the root cause:

Load the vmcore in crash (please refer to the hyperlink above). Crash should present details similar to the following:
crash> bt
PID: 11388  TASK: ffff880eb1f79e00  CPU: 29  COMMAND: "heartbeat"
 #0 [ffff8809131a7b08] machine_kexec at ffffffff8105c22b
 #1 [ffff8809131a7b68] crash_kexec at ffffffff8110e852
 #2 [ffff8809131a7c38] oops_end at ffffffff81031c49
 #3 [ffff8809131a7c60] die at ffffffff810320fb
 #4 [ffff8809131a7c90] do_trap at ffffffff8102f121
 #5 [ffff8809131a7ce0] do_error_trap at ffffffff8102f4a9
 #6 [ffff8809131a7da0] do_invalid_op at ffffffff8102fa10
 #7 [ffff8809131a7db0] invalid_op at ffffffff8184638e
    [exception RIP: __fput+541]
    RIP: ffffffff812126ad  RSP: ffff8809131a7e68  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880ef6915700  RCX: 0000000365fb1705
    RDX: 0000000000000001  RSI: ffff880fff55a020  RDI: 0000000000000000
    RBP: ffff8809131a7ea0   R8: 000000000001a020   R9: ffffffff811b591d
    R10: ffffea002b69b300  R11: ffff880ef6915710  R12: 0000000000000010
    R13: ffff880ed152aef8  R14: ffff8800bba18aa0  R15: ffff880ed1513a40
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8809131a7e60] __fput at ffffffff812125ac
 #9 [ffff8809131a7ea8] ____fput at ffffffff812126ee
#10 [ffff8809131a7eb8] task_work_run at ffffffff8109f101
#11 [ffff8809131a7ef8] exit_to_usermode_loop at ffffffff81003242
#12 [ffff8809131a7f30] syscall_return_slowpath at ffffffff81003c6e
#13 [ffff8809131a7f50] int_ret_from_sys_call at ffffffff818449d0
    RIP: 000000000047f704  RSP: 000000c423b77c98  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 000000000047f704
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00000000000000ca
    RBP: 000000c423b77ce0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000000000000000  R14: 000000c423b78ee0  R15: 0000000000000008
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
crash>

crash> log
[19156101.592212] ------------[ cut here ]------------
[19156101.593103] kernel BUG at /build/linux-SwhOyu/linux-4.4.0/include/linux/fs.h:2582!
[19156101.594385] invalid opcode: 0000 [#1] SMP
[19156101.595083] Modules linked in: binfmt_misc af_packet_diag netlink_diag dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth br_netfilter ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_set xt_mark ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport ip_set dummy xt_comment xt_addrtype iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_tcpudp bridge stp llc nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack aufs isofs ppdev crct10dif_pclmul parport_pc crc32_pclmul input_leds joydev ghash_clmulni_intel parport serio_raw ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov
[19156101.606434]  async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse floppy
[19156101.609129] CPU: 29 PID: 11388 Comm: heartbeat Not tainted 4.4.0-104-generic #127-Ubuntu
[19156101.610384] Hardware name: Smdbmds KVM, BIOS seabios-1.9.1-qemu-project.org 04/01/2014
[19156101.611637] task: ffff880eb1f79e00 ti: ffff8809131a4000 task.ti: ffff8809131a4000
[19156101.612905] RIP: 0010:[<ffffffff812126ad>]  [<ffffffff812126ad>] __fput+0x21d/0x220
[19156101.614188] RSP: 0018:ffff8809131a7e68  EFLAGS: 00010246
[19156101.614989] RAX: 0000000000000000 RBX: ffff880ef6915700 RCX: 0000000365fb1705
[19156101.616143] RDX: 0000000000000001 RSI: ffff880fff55a020 RDI: 0000000000000000
[19156101.617285] RBP: ffff8809131a7ea0 R08: 000000000001a020 R09: ffffffff811b591d
[19156101.618422] R10: ffffea002b69b300 R11: ffff880ef6915710 R12: 0000000000000010
[19156101.619574] R13: ffff880ed152aef8 R14: ffff8800bba18aa0 R15: ffff880ed1513a40
[19156101.620785] FS:  000000c42085bc90(0000) GS:ffff880fff540000(0000) knlGS:0000000000000000
[19156101.622074] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19156101.622921] CR2: 00007f508b166b04 CR3: 0000000e0981b000 CR4: 00000000003406e0
[19156101.624062] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19156101.625210] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19156101.626349] Stack:
[19156101.626765]  ffff880ed152aef8 ffff880ef6915710 ffff880eb1f79e00 ffffffff8210ad50
[19156101.628018]  ffff880ef6915700 0000000000000000 ffff880eb1f7a4a0 ffff8809131a7eb0
[19156101.629305]  ffffffff812126ee ffff8809131a7ef0 ffffffff8109f101 ffff880eb1f7a4d4
[19156101.630568] Call Trace:
[19156101.631036]  [<ffffffff812126ee>] ____fput+0xe/0x10
[19156101.631804]  [<ffffffff8109f101>] task_work_run+0x81/0xa0
[19156101.632612]  [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0
[19156101.633497]  [<ffffffff81003c6e>] syscall_return_slowpath+0x4e/0x60
[19156101.634402]  [<ffffffff818449d0>] int_ret_from_sys_call+0x25/0x8f
[19156101.635285] Code: 0f 84 cf fe ff ff 48 8b 43 28 48 8b 80 80 00 00 00 48 85 c0 0f 84 bb fe ff ff 31 d2 48 89 de bf ff ff ff ff ff d0 e9 aa fe ff ff <0f> 0b 90 0f 1f 44 00 00 31 ff 48 87 3d 8a 6e fc 00 48 85 ff 74
[19156101.639244] RIP  [<ffffffff812126ad>] __fput+0x21d/0x220
[19156101.640049]  RSP <ffff8809131a7e68>

Referencing the line above in the source code, fs.h:2582!, we see the panic is due to a BUG_ON:

static void __fput(struct file *file)
{
    ....   
        if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
                i_readcount_dec(inode);//
    ....   
}

static inline void i_readcount_dec(struct inode *inode)
{
        BUG_ON(!atomic_read(&inode->i_readcount));
        atomic_dec(&inode->i_readcount);
}

And the corresponding dissasembly for the panic location:

crash> dis -r __fput+541
...
0xffffffff812126a1 <__fput+529>:        mov    $0xffffffff,%edi
0xffffffff812126a6 <__fput+534>:        callq  *%rax
0xffffffff812126a8 <__fput+536>:        jmpq   0xffffffff81212557 <__fput+199>
0xffffffff812126ad <__fput+541>:        ud2
crash>

Jumped to the ud2 that caused the panic. Where did we jump from?

crash> dis  __fput | grep __fput+541
0xffffffff81212638 <__fput+424>:        je     0xffffffff812126ad <__fput+541>
0xffffffff812126ad <__fput+541>:        ud2
crash>

And the assembly before the je:

crash> dis  __fput | grep __fput+541 -B3
0xffffffff8121262e <__fput+414>:        retq
0xffffffff8121262f <__fput+415>:        mov    0x154(%r13),%eax
0xffffffff81212636 <__fput+422>:        test   %eax,%eax
0xffffffff81212638 <__fput+424>:        je     0xffffffff812126ad <__fput+541>

Above r13 is likely the inode, so the 0x154(%r13) is inode.i_readcount:

crash> struct inode.i_readcount -xo
struct inode {
  [0x154] atomic_t i_readcount;
}
crash>

The r13 is  ffff880ed152aef8, so get the value of inode.i_readcount is 111:

crash> bt | grep R13
    R13: ffff880ed152aef8  R14: ffff8800bba18aa0  R15: ffff880ed1513a40

crash> inode.i_readcount.counter ffff880ed152aef8
  i_readcount.counter = 111
crash>

the inode.i_readcount.counter is not equal 0, why call BUG_ON?

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

auto-github-kubernetes-kubernetes #70229
[open sig/network sig/node] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Server reboots every 4.1 weeks

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package