On Friday, October 09, 2015 at 06:59, Oskar Liljeblad wrote: > > > To see if it is the cause of this issue, I built a test kernel with a > > > revert of commit 97b2591. The test kernel can be downloaded from: > > > > > > http://kernel.ubuntu.com/~jsalisbury/lp1499203/ [..] > The 3.13.0-66.107~lp1445195Commit97b2591Reverted kernel seem to work just > fine. No memory leaks as far as I can see. By the way, I had to downgrade the kernel above to 3.13.0-65.106 on one server because of some strange IO lockup issues. I'm afraid this won't be of much help, but I'm writing it anyway. It started 1 minute after boot with the new kernel: Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106544] BUG: unable to handle kernel NULL pointer dereference at (null) Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106592] IP: [] eventpoll_release_file+0x2b/0xa0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106624] PGD 1f72db067 PUD 1fa753067 PMD 0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106659] Oops: 0000 [#1] SMP Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106684] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106848] CPU: 1 PID: 1286 Comm: mongod Not tainted 3.13.0-66-generic #107~lp1445195Commit97b2591Reverted Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106884] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106923] task: ffff8801f722c800 ti: ffff8801f72ce000 task.ti: ffff8801f72ce000 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106950] RIP: 0010:[] [] eventpoll_release_file+0x2b/0xa0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106986] RSP: 0018:ffff8801f72cfe78 EFLAGS: 00010246 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107006] RAX: 0000000000000000 RBX: ffff8801f775e300 RCX: 0000000040000010 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107032] RDX: 0000000001000000 RSI: 0000000000000000 RDI: ffffffff81c72e80 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107058] RBP: ffff8801f72cfea0 R08: 0000000000000000 R09: 0000000000000001 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107084] R10: ffff8801f775ece1 R11: 0000000000000293 R12: 0000000000000010 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107110] R13: ffff8801f775ece1 R14: ffff8801f775ee40 R15: ffff8801f775e3b0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107137] FS: 00007f23b299f700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107190] CR2: 0000000000000000 CR3: 00000001f7a94000 CR4: 00000000001406e0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107224] Stack: Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107235] ffff8801f775e300 0000000000000010 ffff8801f775ece1 ffff8801f775ee40 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107270] ffff880036927a40 ffff8801f72cfee8 ffffffff811bfb7a ffffffff8133ed81 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107302] ffff8801fa8bbe30 0000000000000000 ffffffff81ebb680 ffff8801f722ce20 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107336] Call Trace: Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107353] [] __fput+0x24a/0x260 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107375] [] ? blkdev_issue_flush+0x71/0x90 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107400] [] ____fput+0xe/0x10 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107421] [] task_work_run+0xa7/0xe0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107444] [] do_notify_resume+0x97/0xb0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107468] [] int_signal+0x12/0x17 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107491] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 48 c7 c7 80 2e c7 81 49 81 c7 b0 00 00 00 41 56 41 55 41 54 53 e8 b8 30 52 00 49 8b 07 <48> 8b 08 49 39 c7 4c 8d 60 a8 48 8d 59 a8 75 0b eb 3e 0f 1f 00 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107648] RIP [] eventpoll_release_file+0x2b/0xa0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107675] RSP Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107689] CR2: 0000000000000000 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107717] ---[ end trace 87deccc21e1958fa ]--- Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210565] ------------[ cut here ]------------ Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210612] kernel BUG at /home/jsalisbury/bugs/lp1499203/ubuntu-trusty/mm/rmap.c:1035! Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210642] invalid opcode: 0000 [#2] SMP Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210663] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210796] CPU: 1 PID: 1771 Comm: mongod Tainted: G D 3.13.0-66-generic #107~lp1445195Commit97b2591Reverted Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210834] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210873] task: ffff8801f7713000 ti: ffff8801fafa4000 task.ti: ffff8801fafa4000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210900] RIP: 0010:[] [] __page_set_anon_rmap.part.22+0x9/0xb Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210939] RSP: 0018:ffff8801fafa59e8 EFLAGS: 00010246 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210960] RAX: 0000000000000000 RBX: ffffea00079a2340 RCX: ffffffffffffffe8 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210986] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880207ff4f00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.211021] RBP: ffff8801fafa59e8 R08: 00000000fffffff9 R09: 0000000000000000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.212294] R10: 000000000000000c R11: 00000000003e9480 R12: 00007f084a5619e0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214126] R13: 0000000000000000 R14: ffff8801f775e300 R15: 0000000000000000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] FS: 00007f084a561700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] CR2: 00007f084a5619e0 CR3: 00000001f7a94000 CR4: 00000000001406e0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] Stack: Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] ffff8801fafa5a18 ffffffff8118464a 00007f084a5619e0 ffff8800f78ea290 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] ffff8801f775e300 ffff8801fa652300 ffff8801fafa5ab0 ffffffff8117a708 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] ffff880035aab300 0000000035aab300 0000000000000000 0000000000001f4a Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] Call Trace: Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] do_page_add_anon_rmap+0x10a/0x120 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] handle_mm_fault+0xcf8/0xf00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] __do_page_fault+0x184/0x560 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? update_cfs_shares+0xb1/0x100 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? __enqueue_entity+0x78/0x80 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? enqueue_entity+0x2ad/0xbb0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? native_sched_clock+0x13/0x80 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? enqueue_task_fair+0x422/0x6d0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] do_page_fault+0x1a/0x70 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] page_fault+0x28/0x30 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? __get_user_8+0x1f/0x29 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? exit_robust_list+0x32/0x130 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] mm_release+0x123/0x140 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] do_exit+0x153/0xa40 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] do_group_exit+0x3f/0xa0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] get_signal_to_deliver+0x1d0/0x6d0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] do_signal+0x48/0xa10 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] ? handle_mm_fault+0x482/0xf00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] do_notify_resume+0x69/0xb0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [] retint_signal+0x48/0x86 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] Code: c4 40 74 03 8b 4f 68 bf 00 10 00 00 48 d3 e7 e8 2d 58 a7 ff 5d c3 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 89 f2 be 00 80 00 00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] RIP [] __page_set_anon_rmap.part.22+0x9/0xb Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] RSP Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.249727] ---[ end trace 87deccc21e1958fb ]--- Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.251013] Fixing recursive fault but reboot is needed! After that all IO on that device stuck. I rebooted the server and the issue occurred again, basically the same messages logged. Regards, Oskar Liljeblad