Comment 2 for bug 1449162

Revision history for this message
Varun Lodaya (varun-lodaya) wrote : Re: [Bug 1449162] Re: Seeing kernel backtrace and vrouter process stuck in INIT state with contrail 2.01 and kernel 3.13

Hey Anand,

It¹s kind of tricky to figure out if there was a memory leak after the
reboot. We are trying to plot it for the existing hypervisors to see if
they are running into memory leaks. Do you want us to check any specific
stuff?
Btw, we have seen more hypervisors with kernel 3.13 crashing in last 2
days.

Thanks,
Varun

On 4/28/15, 2:21 AM, "Anand H. Krishnan" <email address hidden> wrote:

>Hi
>
>Can you please check whether there is a steady memory leak before this
>crash happened? It is difficult to see what has gone wrong without a
>dump.
>
>Thanks,
>
>--
>You received this bug notification because you are subscribed to the bug
>report.
>https://bugs.launchpad.net/bugs/1449162
>
>Title:
> Seeing kernel backtrace and vrouter process stuck in INIT state with
> contrail 2.01 and kernel 3.13
>
>Status in OpenContrail:
> New
>
>Bug description:
> We are seeing these frequent kernel backtraces with contrail running
> on 2.01-41 and kernel version 3.13. The only way to recover after this
> is a hypervisor reboot. Need to dig into the root-cause of this as
> this is seriously affecting our Uptime.
>
> Following is the backtrace:
> 2015-04-25T12:28:05.944331+00:00 b0c010ash2018 kernel: [535885.190684]
>BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> 2015-04-25T12:28:05.944348+00:00 b0c010ash2018 kernel: [535885.215308]
>IP: [<ffffffff811b0657>] kmem_cache_alloc+0x77/0x1f0
> 2015-04-25T12:28:05.944349+00:00 b0c010ash2018 kernel: [535885.230779]
>PGD 12df28d067 PUD 12d69d0067 PMD 0
> 2015-04-25T12:28:05.944350+00:00 b0c010ash2018 kernel: [535885.247356]
>Oops: 0000 [#1] SMP
> 2015-04-25T12:28:05.944351+00:00 b0c010ash2018 kernel: [535885.264374]
>Modules linked in: veth vhost_net macvtap macvlan vhost xt_conntrack
>iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>nf_nat nf_conntrack 8021q mrp garp bridge stp llc ip6table_filter
>ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nbd
>vrouter(OX) ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
>iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding
>x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dm_multipath
>crct10dif_pclmul crc32_pclmul dcdbas ghash_clmulni_intel scsi_dh xfs
>gpio_ich joydev ipmi_devintf aesni_intel msr ablk_helper lp mac_hid
>cryptd sb_edac edac_core lrw gf128mul parport mei_me wmi ioatdma lpc_ich
>mei ipmi_si glue_helper libcrc32c shpchp acpi_power_meter aes_x86_64
>hid_generic usbhid hid ixgbe igb megaraid_sas dca i2c_algo_bit ptp
>pps_core mdio
> 2015-04-25T12:28:05.944354+00:00 b0c010ash2018 kernel: [535885.575624]
>CPU: 28 PID: 45796 Comm: openstack-statu Tainted: G OX
>3.13.0-49-generic #81~precise1-Ubuntu
> 2015-04-25T12:28:05.944355+00:00 b0c010ash2018 kernel: [535885.654294]
>Hardware name: Dell Inc. PowerEdge R720xd/0X3D66, BIOS 2.4.3 07/09/2014
> 2015-04-25T12:28:05.944356+00:00 b0c010ash2018 kernel: [535885.737038]
>task: ffff8812f2ce0000 ti: ffff8812c70fe000 task.ti: ffff8812c70fe000
> 2015-04-25T12:28:05.944356+00:00 b0c010ash2018 kernel: [535885.826571]
>RIP: 0010:[<ffffffff811b0657>] [<ffffffff811b0657>]
>kmem_cache_alloc+0x77/0x1f0
> 2015-04-25T12:28:05.944357+00:00 b0c010ash2018 kernel: [535885.923753]
>RSP: 0018:ffff8812c70ffd90 EFLAGS: 00010282
> 2015-04-25T12:28:05.944358+00:00 b0c010ash2018 kernel: [535885.974387]
>RAX: 0000000000000000 RBX: 0000000001200011 RCX: 000000000003e8bc
> 2015-04-25T12:28:05.944359+00:00 b0c010ash2018 kernel: [535886.078298]
>RDX: 000000000003e8bb RSI: 00000000000000d0 RDI: 00000000000162a0
> 2015-04-25T12:28:05.944360+00:00 b0c010ash2018 kernel: [535886.188858]
>RBP: ffff8812c70ffde0 R08: ffff88181fbd62a0 R09: ffffffff8108be94
> 2015-04-25T12:28:05.944363+00:00 b0c010ash2018 kernel: [535886.302853]
>R10: ffff88187fffbf00 R11: 00007fff8841b000 R12: 0000000000000001
> 2015-04-25T12:28:05.944365+00:00 b0c010ash2018 kernel: [535886.418492]
>R13: ffff88181f403900 R14: ffff88181f403900 R15: 00000000000000d0
> 2015-04-25T12:28:05.944387+00:00 b0c010ash2018 kernel: [535886.536915]
>FS: 00007fbc18506700(0000) GS:ffff88181fbc0000(0000)
>knlGS:0000000000000000
> 2015-04-25T12:28:05.944388+00:00 b0c010ash2018 kernel: [535886.656231]
>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2015-04-25T12:28:05.944393+00:00 b0c010ash2018 kernel: [535886.716245]
>CR2: 0000000000000001 CR3: 00000012d6b18000 CR4: 00000000001427e0
> 2015-04-25T12:28:05.944394+00:00 b0c010ash2018 kernel: [535886.833304]
>Stack:
> 2015-04-25T12:28:05.944395+00:00 b0c010ash2018 kernel: [535886.889899]
>ffffffff8108be94 ffff8817ef6a0aa8 ffff8817f7189880 0000000000000000
> 2015-04-25T12:28:05.944395+00:00 b0c010ash2018 kernel: [535887.004037]
>ffff8812c70ffde0 0000000001200011 0000000000000000 ffffffff81c452a0
> 2015-04-25T12:28:05.944396+00:00 b0c010ash2018 kernel: [535887.117912]
>0000000000000000 0000000000000000 ffff8812c70ffe10 ffffffff8108be94
> 2015-04-25T12:28:05.944397+00:00 b0c010ash2018 kernel: [535887.232613]
>Call Trace:
> 2015-04-25T12:28:05.944397+00:00 b0c010ash2018 kernel: [535887.288242]
>[<ffffffff8108be94>] ? alloc_pid+0x24/0x2e0
> 2015-04-25T12:28:05.944400+00:00 b0c010ash2018 kernel: [535887.344143]
>[<ffffffff8108be94>] alloc_pid+0x24/0x2e0
> 2015-04-25T12:28:05.944401+00:00 b0c010ash2018 kernel: [535887.398737]
>[<ffffffff8106998a>] copy_process.part.27+0x8ca/0xf50
> 2015-04-25T12:28:05.944401+00:00 b0c010ash2018 kernel: [535887.452662]
>[<ffffffff8110331b>] ? audit_filter_rules.isra.7+0x55b/0xad0
> 2015-04-25T12:28:05.944402+00:00 b0c010ash2018 kernel: [535887.506223]
>[<ffffffff8106a090>] copy_process+0x80/0x90
> 2015-04-25T12:28:05.944402+00:00 b0c010ash2018 kernel: [535887.558678]
>[<ffffffff8106a1d2>] do_fork+0x62/0x280
> 2015-04-25T12:28:05.944403+00:00 b0c010ash2018 kernel: [535887.610125]
>[<ffffffff81103924>] ? audit_filter_syscall+0x94/0xe0
> 2015-04-25T12:28:05.944412+00:00 b0c010ash2018 kernel: [535887.661481]
>[<ffffffff8106a476>] SyS_clone+0x16/0x20
> 2015-04-25T12:28:05.944413+00:00 b0c010ash2018 kernel: [535887.711599]
>[<ffffffff8176ead9>] stub_clone+0x69/0x90
> 2015-04-25T12:28:05.944414+00:00 b0c010ash2018 kernel: [535887.760582]
>[<ffffffff8176e77d>] ? system_call_fastpath+0x1a/0x1f
> 2015-04-25T12:28:05.944414+00:00 b0c010ash2018 kernel: [535887.808776]
>Code: 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 5e 01 00 00
>48 85 c0 0f 84 55 01 00 00 49 63 45 20 49 8b 7d 00 48 8d 4a 01 <49> 8b 1c
>04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 b7 49 63
> 2015-04-25T12:28:05.944414+00:00 b0c010ash2018 kernel: [535887.808776]
>Code: 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 5e 01 00 00
>48 85 c0 0f 84 55 01 00 00 49 63 45 20 49 8b 7d 00 48 8d 4a 01 <49> 8b 1c
>04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 b7 49 63
> 2015-04-25T12:28:05.944415+00:00 b0c010ash2018 kernel: [535887.954161]
>RIP [<ffffffff811b0657>] kmem_cache_alloc+0x77/0x1f0
> 2015-04-25T12:28:05.944416+00:00 b0c010ash2018 kernel: [535888.001499]
>RSP <ffff8812c70ffd90>
> 2015-04-25T12:28:05.944416+00:00 b0c010ash2018 kernel: [535888.047379]
>CR2: 0000000000000001
> 2015-04-25T12:28:05.944419+00:00 b0c010ash2018 kernel: [535888.157161]
>---[ end trace 8e74a782f5824da3 ]---
> 2015-04-25T12:28:06.040139+00:00 b0c010ash2018 kernel: [535888.207652]
>[sched_delayed] sched: RT throttling activated
>
>To manage notifications about this bug go to:
>https://bugs.launchpad.net/opencontrail/+bug/1449162/+subscriptions