[eBay] Server reboots with 3.0.2.1-6 eBay build

Bug #1621816 reported by Mladen Maric
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Critical
Anand H. Krishnan
R3.0.2.x
Fix Committed
Critical
Anand H. Krishnan
R3.1
Fix Committed
Critical
Anand H. Krishnan
Trunk
Fix Committed
Critical
Anand H. Krishnan

Bug Description

eBay reported server freezing/reboots, after fix for issue identified in "2016-0829-1122, Tracebacks in DMESG contrail 3.0.2.1 on CentOS 7" has been fixed as part of PR1618375 (3.0.2.1-6).
https://review.opencontrail.org/#/c/23763/

This happens both with kernel 3.10.0-327.28.3.el7.x86_64 and the "officially supported" 3.10.0-327.10.1.el7.x86_64

Core dumps to be uploaded.

dmesg:
3994.317851] BUG: unable to handle kernel paging request at ffff883fccdb8000
[ 3994.317886] IP: [<ffffffffa0605f45>] vr_interface_add_response+0x75/0x170 [vrouter]
[ 3994.317955] PGD 1f32067 PUD 3fcd9e1063 PMD 3fccd65063 PTE 8000003fccdb8161
[ 3994.317977] Oops: 0003 [#1] SMP
[ 3994.317990] Modules linked in: xt_comment xt_multiport vfat fat mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase dell_rbu vhost_net vhost macvtap macvlan tun binfmt_misc vrouter(OE) ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ipmi_devintf pcspkr sb_edac mei_me edac_core
[ 3994.318227] lpc_ich mei shpchp mfd_core ipmi_si ipmi_msghandler wmi tpm_crb acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt drm_kms_helper crct10dif_pclmul ttm crct10dif_common crc32c_intel drm i40e igb ahci vxlan libahci dca ip6_udp_tunnel i2c_algo_bit udp_tunnel libata i2c_core ptp megaraid_sas pps_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage]
[ 3994.318374] CPU: 15 PID: 11435 Comm: contrail-vroute Tainted: G OE ------------ 3.10.0-327.10.1.el7.x86_64 #1
[ 3994.318404] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.0.2 03/15/2016
[ 3994.318425] task: ffff881f93150000 ti: ffff881f91968000 task.ti: ffff881f91968000
[ 3994.318445] RIP: 0010:[<ffffffffa0605f45>] [<ffffffffa0605f45>] vr_interface_add_response+0x75/0x170 [vrouter]
[ 3994.318478] RSP: 0018:ffff881f9196b5f0 EFLAGS: 00010206
[ 3994.318493] RAX: 0000000000000041 RBX: ffff883f8c283000 RCX: ffff883fccdb7e00
[ 3994.318513] RDX: 0000000000000040 RSI: ffff883f84818000 RDI: ffff883f8c283000
[ 3994.318532] RBP: ffff881f9196b5f0 R08: 0000000000000000 R09: ffff883fccdb7e00
[ 3994.318551] R10: ffff881ffec07600 R11: ffffffffa05f1593 R12: ffff883faa6c7400
[ 3994.318570] R13: 0000000000000001 R14: ffffffffa061fc60 R15: ffff883fbccc3700
[ 3994.318590] FS: 00002b6d84801700(0000) GS:ffff883ffeae0000(0000) knlGS:0000000000000000
[ 3994.318611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3994.318627] CR2: ffff883fccdb8000 CR3: 0000003f55925000 CR4: 00000000003427e0
[ 3994.318645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3994.318664] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3994.318683] Stack:
[ 3994.318690] ffff881f9196b648 ffffffffa0607bc9 000000000000001b ffff881f9196b630
[ 3994.318718] ffffffffa05f8483 000000003a52b449 ffff883f8c284c00 0000000000000000
[ 3994.318749] ffff883f8c283000 ffffffffa061fc60 ffff883fbccc3700 ffff881f9196b688
[ 3994.318779] Call Trace:
[ 3994.318794] [<ffffffffa0607bc9>] vr_interface_make_req+0x469/0x510 [vrouter]
[ 3994.318818] [<ffffffffa05f8483>] ? lh_zalloc+0x33/0x40 [vrouter]
[ 3994.318841] [<ffffffffa060969e>] vr_interface_req_process+0x1fe/0x2b0 [vrouter]
[ 3994.318864] [<ffffffffa05f43f9>] sandesh_decode_one+0x119/0x200 [vrouter]
[ 3994.318887] [<ffffffffa05f6880>] ? thrift_binary_protocol_init+0x270/0x270 [vrouter]
[ 3994.319638] [<ffffffffa05f6890>] ? thrift_memory_buffer_is_open+0x10/0x10 [vrouter]
[ 3994.320372] [<ffffffffa05f68a0>] ? thrift_memory_buffer_open+0x10/0x10 [vrouter]
[ 3994.321102] [<ffffffffa05f68e0>] ? thrift_memory_buffer_flush+0x10/0x10 [vrouter]
[ 3994.321828] [<ffffffffa05f68b0>] ? thrift_memory_buffer_close+0x10/0x10 [vrouter]
[ 3994.322542] [<ffffffffa05f6970>] ? thrift_memory_buffer_read+0x90/0x90 [vrouter]
[ 3994.323247] [<ffffffffa05f68c0>] ? thrift_memory_buffer_read_end+0x10/0x10 [vrouter]
[ 3994.323951] [<ffffffffa05f68d0>] ? thrift_memory_buffer_write_end+0x10/0x10 [vrouter]
[ 3994.324654] [<ffffffffa05f63b0>] ? thrift_binary_protocol_write_string+0x50/0x50 [vrouter]
[ 3994.325354] [<ffffffffa05f5360>] ? thrift_binary_protocol_write_message_end+0x10/0x10 [vrouter]
[ 3994.326048] [<ffffffffa05f6480>] ? thrift_binary_protocol_write_set_begin+0x10/0x10 [vrouter]
[ 3994.326734] [<ffffffffa05f5350>] ? thrift_binary_protocol_write_bool+0x60/0x60 [vrouter]
[ 3994.327414] [<ffffffffa05f5370>] ? thrift_binary_protocol_write_sandesh_end+0x10/0x10 [vrouter]
[ 3994.328082] [<ffffffffa05f5380>] ? thrift_binary_protocol_write_struct_begin+0x10/0x10 [vrouter]
[ 3994.328732] [<ffffffffa05f61b0>] ? thrift_binary_protocol_read_binary+0xf0/0xf0 [vrouter]
[ 3994.329366] [<ffffffffa05f5390>] ? thrift_binary_protocol_write_struct_end+0x10/0x10 [vrouter]
[ 3994.330153] [<ffffffffa05f53a0>] ? thrift_binary_protocol_write_field_end+0x10/0x10 [vrouter]
[ 3994.330755] [<ffffffffa05f6550>] ? thrift_binary_protocol_write_message_begin+0xd0/0xd0 [vrouter]
[ 3994.331342] [<ffffffffa05f5400>] ? thrift_binary_protocol_write_field_stop+0x60/0x60 [vrouter]
[ 3994.331913] [<ffffffffa05f63d0>] ? thrift_binary_protocol_write_sandesh_begin+0x20/0x20 [vrouter]
[ 3994.332468] [<ffffffffa05f5410>] ? thrift_binary_protocol_write_map_end+0x10/0x10 [vrouter]
[ 3994.333008] [<ffffffffa05f6470>] ? thrift_binary_protocol_write_list_begin+0xa0/0xa0 [vrouter]
[ 3994.333540] [<ffffffffa05f5420>] ? thrift_binary_protocol_write_list_end+0x10/0x10 [vrouter]
[ 3994.334047] [<ffffffffa05f52f0>] ? thrift_binary_protocol_write_u64+0x80/0x80 [vrouter]
[ 3994.334534] [<ffffffffa05f51c0>] ? thrift_protocol_skip+0x270/0x270 [vrouter]
[ 3994.335009] [<ffffffffa05f59c0>] ? thrift_binary_protocol_read_ipv4+0x10/0x10 [vrouter]
[ 3994.335482] [<ffffffffa05f5830>] ? thrift_binary_protocol_read_double+0x10/0x10 [vrouter]
[ 3994.335943] [<ffffffffa05f51f0>] ? thrift_binary_protocol_write_byte+0x30/0x30 [vrouter]
[ 3994.336404] [<ffffffffa05f5bb0>] ? thrift_binary_protocol_read_u16+0x70/0x70 [vrouter]
[ 3994.336859] [<ffffffffa05f5880>] ? thrift_binary_protocol_write_i32+0x50/0x50 [vrouter]
[ 3994.337319] [<ffffffffa05f5270>] ? thrift_binary_protocol_write_i64+0x80/0x80 [vrouter]
[ 3994.337760] [<ffffffffa05f6250>] ? thrift_binary_protocol_write_field_begin+0xa0/0xa0 [vrouter]
[ 3994.338202] [<ffffffffa05f5fc0>] ? thrift_binary_protocol_read_sandesh_begin+0x20/0x20 [vrouter]
[ 3994.338657] [<ffffffffa05f5460>] ? thrift_binary_protocol_write_uuid_t+0x30/0x30 [vrouter]
[ 3994.339105] [<ffffffffa05f6360>] ? thrift_binary_protocol_write_binary+0xb0/0xb0 [vrouter]
[ 3994.339542] [<ffffffffa05f62b0>] ? thrift_binary_protocol_write_ipv4+0x60/0x60 [vrouter]
[ 3994.339963] [<ffffffffa05f6360>] ? thrift_binary_protocol_write_binary+0xb0/0xb0 [vrouter]
[ 3994.340383] [<ffffffffa05f5430>] ? thrift_binary_protocol_write_set_end+0x10/0x10 [vrouter]
[ 3994.340817] [<ffffffffa05f5fa0>] ? thrift_binary_protocol_read_message_begin+0xf0/0xf0 [vrouter]
[ 3994.341239] [<ffffffffa05f5480>] ? thrift_binary_protocol_read_message_end+0x10/0x10 [vrouter]
[ 3994.341660] [<ffffffffa05f5eb0>] ? thrift_binary_protocol_read_string+0xe0/0xe0 [vrouter]
[ 3994.342074] [<ffffffffa05f5470>] ? thrift_binary_protocol_write_double+0x10/0x10 [vrouter]
[ 3994.342489] [<ffffffffa05f5490>] ? thrift_binary_protocol_read_sandesh_end+0x10/0x10 [vrouter]
[ 3994.342896] [<ffffffffa05f54b0>] ? thrift_binary_protocol_read_struct_begin+0x20/0x20 [vrouter]
[ 3994.343295] [<ffffffffa05f5a90>] ? thrift_binary_protocol_read_i16+0x70/0x70 [vrouter]
[ 3994.343693] [<ffffffffa05f54c0>] ? thrift_binary_protocol_read_struct_end+0x10/0x10 [vrouter]
[ 3994.344085] [<ffffffffa05f5c10>] ? thrift_binary_protocol_write_u16+0x60/0x60 [vrouter]
[ 3994.344467] [<ffffffffa05f54d0>] ? thrift_binary_protocol_read_field_end+0x10/0x10 [vrouter]
[ 3994.344847] [<ffffffffa05f5d00>] ? thrift_binary_protocol_read_map_begin+0xf0/0xf0 [vrouter]
[ 3994.345231] [<ffffffffa05f54e0>] ? thrift_binary_protocol_read_map_end+0x10/0x10 [vrouter]
[ 3994.345613] [<ffffffffa05f5dc0>] ? thrift_binary_protocol_read_list_begin+0xc0/0xc0 [vrouter]
[ 3994.345990] [<ffffffffa05f54f0>] ? thrift_binary_protocol_read_list_end+0x10/0x10 [vrouter]
[ 3994.346378] [<ffffffffa05f5500>] ? thrift_binary_protocol_read_set_end+0x10/0x10 [vrouter]
[ 3994.346757] [<ffffffffa05f5570>] ? thrift_binary_protocol_read_bool+0x70/0x70 [vrouter]
[ 3994.347137] [<ffffffffa05f5a20>] ? thrift_binary_protocol_write_i16+0x60/0x60 [vrouter]
[ 3994.347512] [<ffffffffa05f58d0>] ? thrift_binary_protocol_write_u32+0x50/0x50 [vrouter]
[ 3994.347881] [<ffffffffa05f55e0>] ? thrift_binary_protocol_read_byte+0x70/0x70 [vrouter]
[ 3994.348245] [<ffffffffa05f5b40>] ? thrift_binary_protocol_read_field_begin+0xb0/0xb0 [vrouter]
[ 3994.348612] [<ffffffffa05f5940>] ? thrift_binary_protocol_read_i32+0x70/0x70 [vrouter]
[ 3994.348977] [<ffffffffa05f5680>] ? thrift_binary_protocol_read_i64+0xa0/0xa0 [vrouter]
[ 3994.349337] [<ffffffffa05f59b0>] ? thrift_binary_protocol_read_u32+0x70/0x70 [vrouter]
[ 3994.349693] [<ffffffffa05f5720>] ? thrift_binary_protocol_read_u64+0xa0/0xa0 [vrouter]
[ 3994.350163] [<ffffffffa05f5820>] ? thrift_binary_protocol_read_uuid_t+0x30/0x30 [vrouter]
[ 3994.350552] [<ffffffffa05f5dd0>] ? thrift_binary_protocol_read_set_begin+0x10/0x10 [vrouter]
[ 3994.350900] [<ffffffffa05f60c0>] ? thrift_binary_protocol_write_ipaddr+0x100/0x100 [vrouter]
[ 3994.351245] [<ffffffffa05f5dd0>] ? thrift_binary_protocol_read_set_begin+0x10/0x10 [vrouter]
[ 3994.351589] [<ffffffffa05f57f0>] ? thrift_binary_protocol_read_ipaddr+0xd0/0xd0 [vrouter]
[ 3994.351941] [<ffffffffa05f42c0>] ? sandesh_hdr_free+0x10/0x10 [vrouter]
[ 3994.352292] [<ffffffffa05f4596>] sandesh_decode+0x46/0x90 [vrouter]
[ 3994.352645] [<ffffffffa05fe343>] sandesh_proto_decode+0x33/0x50 [vrouter]
[ 3994.353003] [<ffffffffa05fddad>] vr_message_request+0x3d/0x70 [vrouter]
[ 3994.353356] [<ffffffffa05fd323>] netlink_trans_request+0x63/0x1e0 [vrouter]
[ 3994.353706] [<ffffffff81285708>] ? security_capable+0x18/0x20
[ 3994.354055] [<ffffffff81316cc2>] ? nla_parse+0x32/0x120
[ 3994.354405] [<ffffffff8155c0fd>] genl_family_rcv_msg+0x1cd/0x400
[ 3994.354755] [<ffffffff811c106a>] ? kmem_cache_alloc+0x1ba/0x1d0
[ 3994.355104] [<ffffffff8155c330>] ? genl_family_rcv_msg+0x400/0x400
[ 3994.355454] [<ffffffff8155c3c1>] genl_rcv_msg+0x91/0xd0
[ 3994.355807] [<ffffffff8155a329>] netlink_rcv_skb+0xa9/0xc0
[ 3994.356159] [<ffffffff8155a858>] genl_rcv+0x28/0x40
[ 3994.356507] [<ffffffff8155993d>] netlink_unicast+0xed/0x1b0
[ 3994.356864] [<ffffffff81559d30>] netlink_sendmsg+0x330/0x770
[ 3994.357216] [<ffffffff81288935>] ? sock_has_perm+0x75/0x90
[ 3994.357568] [<ffffffff81510dd0>] sock_sendmsg+0xb0/0xf0
[ 3994.357917] [<ffffffff8151149f>] ? sock_recvmsg+0xbf/0x100
[ 3994.358266] [<ffffffff81511209>] ___sys_sendmsg+0x3a9/0x3c0
[ 3994.358618] [<ffffffff810bb685>] ? sched_clock_cpu+0x85/0xc0
[ 3994.358970] [<ffffffff8163a398>] ? __schedule+0x2d8/0x900
[ 3994.359320] [<ffffffff8130c4bd>] ? list_del+0xd/0x30
[ 3994.359668] [<ffffffff8122bc32>] ? eventfd_ctx_read+0x102/0x210
[ 3994.360015] [<ffffffff815120f1>] __sys_sendmsg+0x51/0x90
[ 3994.360370] [<ffffffff81512142>] SyS_sendmsg+0x12/0x20
[ 3994.360726] [<ffffffff81645a49>] system_call_fastpath+0x16/0x1b
[ 3994.361071] Code: 30 48 01 87 b8 00 00 00 85 d2 74 2d 31 c0 0f 1f 84 00 00 00 00 00 4c 8b 46 38 48 8b 8f d8 00 00 00 48 63 d0 83 c0 01 4d 8b 04 d0 <4c> 01 04 d1 8b 15 b1 88 01 00 39 d0 72 dd 89 97 e0 00 00 00 48
[ 3994.361874] RIP [<ffffffffa0605f45>] vr_interface_add_response+0x75/0x170 [vrouter]
[ 3994.362267] RSP <ffff881f9196b5f0>
[ 3994.362651] CR2: ffff883fccdb8000

Tags: vrouter ebay
Revision history for this message
Mladen Maric (mmaric) wrote :
tags: added: ebay vrouter
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/24033
Submitter: Anand H. Krishnan (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/24033
Committed: http://github.org/Juniper/contrail-vrouter/commit/74bc8f26ecfd99bad9dcb6948d038b5efe8dfa41
Submitter: Zuul
Branch: R3.0.2.x

commit 74bc8f26ecfd99bad9dcb6948d038b5efe8dfa41
Author: Anand H. Krishnan <email address hidden>
Date: Fri Sep 9 19:20:27 2016 +0530

Fix memory allocation for interface request

In order to pass per lcore queue input error statistics to the
application that does vif query, we allocate only VR_MAX_CPUS
worth of memory, but we try to copy vr_num_cpus worth of data.
In the case of vr_num_cpus > VR_MAX_CPUS (64), we will hit a snag.
As a fix, allocate memory for vr_num_cpus.

Change-Id: Ifb1060aac20011b8d51e1b31063a363fe268fd3d
Closes-Bug: #1621816

information type: Proprietary → Public
Changed in juniperopenstack:
importance: Undecided → Critical
assignee: nobody → Anand H. Krishnan (anandhk)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/24126
Submitter: Anand H. Krishnan (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/24127
Submitter: Anand H. Krishnan (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/24128
Submitter: Anand H. Krishnan (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/24126
Committed: http://github.org/Juniper/contrail-vrouter/commit/e891b0137450e5fa57d2d09e9325fa7cee5126e9
Submitter: Zuul
Branch: master

commit e891b0137450e5fa57d2d09e9325fa7cee5126e9
Author: Anand H. Krishnan <email address hidden>
Date: Fri Sep 9 19:20:27 2016 +0530

Fix memory allocation for interface request

In order to pass per lcore queue input error statistics to the
application that does vif query, we allocate only VR_MAX_CPUS
worth of memory, but we try to copy vr_num_cpus worth of data.
In the case of vr_num_cpus > VR_MAX_CPUS (64), we will hit a snag.
As a fix, allocate memory for vr_num_cpus.

Change-Id: Ifb1060aac20011b8d51e1b31063a363fe268fd3d
Closes-Bug: #1621816

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/24127
Committed: http://github.org/Juniper/contrail-vrouter/commit/3bfe603839ce6fa6e399b31df4c8b58cb1f3ace1
Submitter: Zuul
Branch: R3.1

commit 3bfe603839ce6fa6e399b31df4c8b58cb1f3ace1
Author: Anand H. Krishnan <email address hidden>
Date: Fri Sep 9 19:20:27 2016 +0530

Fix memory allocation for interface request

In order to pass per lcore queue input error statistics to the
application that does vif query, we allocate only VR_MAX_CPUS
worth of memory, but we try to copy vr_num_cpus worth of data.
In the case of vr_num_cpus > VR_MAX_CPUS (64), we will hit a snag.
As a fix, allocate memory for vr_num_cpus.

Change-Id: Ifb1060aac20011b8d51e1b31063a363fe268fd3d
Closes-Bug: #1621816

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/24128
Committed: http://github.org/Juniper/contrail-vrouter/commit/a709ccd317c289e7790e51ccbb2e9e27ddbe80fb
Submitter: Zuul
Branch: R3.0

commit a709ccd317c289e7790e51ccbb2e9e27ddbe80fb
Author: Anand H. Krishnan <email address hidden>
Date: Fri Sep 9 19:20:27 2016 +0530

Fix memory allocation for interface request

In order to pass per lcore queue input error statistics to the
application that does vif query, we allocate only VR_MAX_CPUS
worth of memory, but we try to copy vr_num_cpus worth of data.
In the case of vr_num_cpus > VR_MAX_CPUS (64), we will hit a snag.
As a fix, allocate memory for vr_num_cpus.

Change-Id: Ifb1060aac20011b8d51e1b31063a363fe268fd3d
Closes-Bug: #1621816

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.