vrouter kernel oops, needed to reboot compute nodes

Bug #1398484 reported by Stefan Andres
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
Critical
Anand H. Krishnan
R1.1
Fix Committed
Undecided
Unassigned
R2.0
Won't Fix
Undecided
Anand H. Krishnan

Bug Description

Hey,

we are load testing our 1.20 installation with like 50-200 parallel VMs spawnings. Maybe related or not we had today a kernel oops on most of the compute nodes and they only started working after we rebooted all server nodes (not VMs)

[1041796.641356] Write: Encode sandesh vr_interface_req FAILED(4)
[1041814.272462] BUG: unable to handle kernel NULL pointer dereference at (null)
[1041814.288716] IP: [<ffffffffa03e3efe>] vr_nexthop_add+0xee/0xa40 [vrouter]
[1041814.296190] PGD 2362007067 PUD 3690dc5067 PMD 0
[1041814.303985] Oops: 0000 [#1] SMP
[1041814.312422] Modules linked in: vhost_net vhost macvtap macvlan vrouter(OX) sch_fq_codel xt_multiport xt_LOG xt_limit xt_comment dm_multipath scsi_dh nbd kvm_intel ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables btrfs raid6_pq xor ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c ipmi_devintf isofs x86_pkg_temp_thermal intel_powerclamp coretemp kvm gpio_ich crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ioatdma aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si wmi cryptd acpi_power_meter hpwdt lpc_ich serio_raw hpilo mac_hid bonding 8021q garp stp mrp llc lp parport psmouse ixgbe tg3 dca ptp pps_core hpsa mdio [last unloaded: vrouter]
[1041814.462111] CPU: 3 PID: 16572 Comm: contrail-vroute Tainted: G OX 3.13.0-39-generic #66-Ubuntu
[1041814.498742] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 08/26/2014
[1041814.518731] task: ffff881fcf833000 ti: ffff883fd139a000 task.ti: ffff883fd139a000
[1041814.560085] RIP: 0010:[<ffffffffa03e3efe>] [<ffffffffa03e3efe>] vr_nexthop_add+0xee/0xa40 [vrouter]
[1041814.604351] RSP: 0018:ffff883fd139b708 EFLAGS: 00010287
[1041814.628038] RAX: 0000000000000030 RBX: ffff881f13188c00 RCX: ffff88129b951600
[1041814.676670] RDX: 0000000000000000 RSI: 00000000000001a0 RDI: ffff881fcf0988a0
[1041814.728896] RBP: ffff883fd139b740 R08: 0000000000000121 R09: 0000000000000002
[1041814.782495] R10: ffff88129b9517a0 R11: ffffffffa03d7574 R12: ffff881fd086f240
[1041814.836890] R13: 000000000000001a R14: 0000000000000000 R15: ffff88129b951790
[1041814.891509] FS: 00007f414ae07700(0000) GS:ffff881fffa60000(0000) knlGS:0000000000000000
[1041814.947152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1041814.974929] CR2: 0000000000000000 CR3: 0000003fa8497000 CR4: 00000000001427e0
[1041815.028982] Stack:
[1041815.055115] 0000000000000190 0000000000000019 ffffffffa03f5000 ffff881f13188c00
[1041815.107808] 0000000000000166 ffff883fd139ba24 ffff8815cb6ba018 ffff883fd139b758
[1041815.160335] ffffffffa03e4ab5 ffffffffa03f5000 ffff883fd139b9d0 ffffffffa03d7fba
[1041815.212822] Call Trace:
[1041815.238305] [<ffffffffa03e4ab5>] vr_nexthop_req_process+0x75/0x90 [vrouter]
[1041815.290678] [<ffffffffa03d7fba>] sandesh_decode_one+0x10a/0x1d0 [vrouter]
[1041815.317098] [<ffffffffa03d9e00>] ? thrift_transport_flush+0x20/0x20 [vrouter]
[1041815.368593] [<ffffffffa03d9e10>] ? thrift_memory_buffer_is_open+0x10/0x10 [vrouter]
[1041815.420643] [<ffffffffa03d9e20>] ? thrift_memory_buffer_open+0x10/0x10 [vrouter]
[1041815.473130] [<ffffffffa03d9e60>] ? thrift_memory_buffer_flush+0x10/0x10 [vrouter]
[1041815.525141] [<ffffffffa03d9e30>] ? thrift_memory_buffer_close+0x10/0x10 [vrouter]
[1041815.577636] [<ffffffffa03d9ef0>] ? thrift_memory_buffer_read+0x90/0x90 [vrouter]
[1041815.629568] [<ffffffffa03d9e40>] ? thrift_memory_buffer_read_end+0x10/0x10 [vrouter]
[1041815.682221] [<ffffffffa03d9e50>] ? thrift_memory_buffer_write_end+0x10/0x10 [vrouter]
[1041815.735105] [<ffffffffa03d8e80>] ? thrift_binary_protocol_write_message_begin+0x80/0x80 [vrouter]
[1041815.788777] [<ffffffffa03d8ce0>] ? thrift_binary_protocol_write_message_end+0x10/0x10 [vrouter]
[1041815.844465] [<ffffffffa03d8e00>] ? thrift_binary_protocol_read_double+0x10/0x10 [vrouter]
[1041815.900441] [<ffffffffa03d8cd0>] ? thrift_protocol_skip+0x270/0x270 [vrouter]
[1041815.955932] [<ffffffffa03d8cf0>] ? thrift_binary_protocol_write_sandesh_end+0x10/0x10 [vrouter]
[1041816.012625] [<ffffffffa03d8d00>] ? thrift_binary_protocol_write_struct_begin+0x10/0x10 [vrouter]
[1041816.070684] [<ffffffffa03d8fc0>] ? thrift_binary_protocol_write_bool+0x20/0x20 [vrouter]
[1041816.129169] [<ffffffffa03d8d10>] ? thrift_binary_protocol_write_struct_end+0x10/0x10 [vrouter]
[1041816.188289] [<ffffffffa03d8ea0>] ? thrift_binary_protocol_write_sandesh_begin+0x20/0x20 [vrouter]
[1041816.246070] [<ffffffffa03d8ec0>] ? thrift_binary_protocol_write_field_stop+0x20/0x20 [vrouter]
[1041816.304325] [<ffffffffa03d8d20>] ? thrift_binary_protocol_write_field_end+0x10/0x10 [vrouter]
[1041816.362549] [<ffffffffa03d8f40>] ? thrift_binary_protocol_write_map_begin+0x80/0x80 [vrouter]
[1041816.420901] [<ffffffffa03d8d30>] ? thrift_binary_protocol_write_map_end+0x10/0x10 [vrouter]
[1041816.479479] [<ffffffffa03d9020>] ? thrift_binary_protocol_write_field_begin+0x60/0x60 [vrouter]
[1041816.538316] [<ffffffffa03d8d40>] ? thrift_binary_protocol_write_list_end+0x10/0x10 [vrouter]
[1041816.596566] [<ffffffffa03d8fa0>] ? thrift_binary_protocol_write_list_begin+0x60/0x60 [vrouter]
[1041816.655115] [<ffffffffa03d9030>] ? thrift_binary_protocol_write_set_begin+0x10/0x10 [vrouter]
[1041816.713960] [<ffffffffa03d91d0>] ? thrift_binary_protocol_write_binary+0x80/0x80 [vrouter]
[1041816.772927] [<ffffffffa03d9250>] ? thrift_binary_protocol_write_u16+0x40/0x40 [vrouter]
[1041816.831039] [<ffffffffa03d9060>] ? thrift_binary_protocol_write_byte+0x30/0x30 [vrouter]
[1041816.889020] [<ffffffffa03d9210>] ? thrift_binary_protocol_write_i16+0x40/0x40 [vrouter]
[1041816.947991] [<ffffffffa03d9280>] ? thrift_binary_protocol_write_i32+0x30/0x30 [vrouter]
[1041817.006503] [<ffffffffa03d90c0>] ? thrift_binary_protocol_write_i64+0x60/0x60 [vrouter]
[1041817.064818] [<ffffffffa03d92b0>] ? thrift_binary_protocol_write_u32+0x30/0x30 [vrouter]
[1041817.122620] [<ffffffffa03d8d50>] ? thrift_binary_protocol_write_set_end+0x10/0x10 [vrouter]
[1041817.180604] [<ffffffffa03d92c0>] ? thrift_binary_protocol_write_ipv4+0x10/0x10 [vrouter]
[1041817.238177] [<ffffffffa03d9150>] ? thrift_binary_protocol_write_uuid_t+0x30/0x30 [vrouter]
[1041817.296926] [<ffffffffa03d92c0>] ? thrift_binary_protocol_write_ipv4+0x10/0x10 [vrouter]
[1041817.354911] [<ffffffffa03d9120>] ? thrift_binary_protocol_write_u64+0x60/0x60 [vrouter]
[1041817.412690] [<ffffffffa03d93e0>] ? thrift_binary_protocol_read_message_begin+0xd0/0xd0 [vrouter]
[1041817.470454] [<ffffffffa03d8d70>] ? thrift_binary_protocol_read_message_end+0x10/0x10 [vrouter]
[1041817.528117] [<ffffffffa03d9310>] ? thrift_binary_protocol_write_string+0x50/0x50 [vrouter]
[1041817.585381] [<ffffffffa03d8d60>] ? thrift_binary_protocol_write_double+0x10/0x10 [vrouter]
[1041817.642525] [<ffffffffa03d8d80>] ? thrift_binary_protocol_read_sandesh_end+0x10/0x10 [vrouter]
[1041817.700197] [<ffffffffa03d8da0>] ? thrift_binary_protocol_read_struct_begin+0x20/0x20 [vrouter]
[1041817.757625] [<ffffffffa03d9540>] ? thrift_binary_protocol_read_list_begin+0x90/0x90 [vrouter]
[1041817.816164] [<ffffffffa03d8db0>] ? thrift_binary_protocol_read_struct_end+0x10/0x10 [vrouter]
[1041817.873858] [<ffffffffa03d9400>] ? thrift_binary_protocol_read_sandesh_begin+0x20/0x20 [vrouter]
[1041817.931824] [<ffffffffa03d8dc0>] ? thrift_binary_protocol_read_field_end+0x10/0x10 [vrouter]
[1041817.989595] [<ffffffffa03d94b0>] ? thrift_binary_protocol_read_map_begin+0xb0/0xb0 [vrouter]
[1041818.047447] [<ffffffffa03d8dd0>] ? thrift_binary_protocol_read_map_end+0x10/0x10 [vrouter]
[1041818.105155] [<ffffffffa03d95d0>] ? thrift_binary_protocol_read_field_begin+0x90/0x90 [vrouter]
[1041818.163003] [<ffffffffa03d8de0>] ? thrift_binary_protocol_read_list_end+0x10/0x10 [vrouter]
[1041818.220774] [<ffffffffa03d95e0>] ? thrift_binary_protocol_read_set_begin+0x10/0x10 [vrouter]
[1041818.279004] [<ffffffffa03d9630>] ? thrift_binary_protocol_read_bool+0x50/0x50 [vrouter]
[1041818.336699] [<ffffffffa03d9680>] ? thrift_binary_protocol_read_byte+0x50/0x50 [vrouter]
[1041818.394349] [<ffffffffa03d96d0>] ? thrift_binary_protocol_read_i16+0x50/0x50 [vrouter]
[1041818.452346] [<ffffffffa03d9720>] ? thrift_binary_protocol_read_i32+0x50/0x50 [vrouter]
[1041818.510599] [<ffffffffa03d97a0>] ? thrift_binary_protocol_read_i64+0x80/0x80 [vrouter]
[1041818.568591] [<ffffffffa03d97f0>] ? thrift_binary_protocol_read_u16+0x50/0x50 [vrouter]
[1041818.626210] [<ffffffffa03d9850>] ? thrift_binary_protocol_read_ipv4+0x10/0x10 [vrouter]
[1041818.683970] [<ffffffffa03d9840>] ? thrift_binary_protocol_read_u32+0x50/0x50 [vrouter]
[1041818.741524] [<ffffffffa03d8df0>] ? thrift_binary_protocol_read_set_end+0x10/0x10 [vrouter]
[1041818.799014] [<ffffffffa03d99e0>] ? thrift_binary_protocol_read_binary+0xe0/0xe0 [vrouter]
[1041818.856252] [<ffffffffa03d9900>] ? thrift_binary_protocol_read_uuid_t+0x30/0x30 [vrouter]
[1041818.913682] [<ffffffffa03d99e0>] ? thrift_binary_protocol_read_binary+0xe0/0xe0 [vrouter]
[1041818.970998] [<ffffffffa03d98d0>] ? thrift_binary_protocol_read_u64+0x80/0x80 [vrouter]
[1041819.028430] [<ffffffffa03d7e90>] ? sandesh_hdr_free+0x10/0x10 [vrouter]
[1041819.057317] [<ffffffffa03d8136>] sandesh_decode+0x46/0x90 [vrouter]
[1041819.085783] [<ffffffffa03e04f4>] sandesh_proto_decode+0x24/0x30 [vrouter]
[1041819.113894] [<ffffffffa03dff6e>] vr_message_request+0x3e/0x60 [vrouter]
[1041819.141608] [<ffffffffa03df9c6>] netlink_trans_request+0x56/0x190 [vrouter]
[1041819.195580] [<ffffffff8165227d>] genl_family_rcv_msg+0x18d/0x370
[1041819.222732] [<ffffffff81652460>] ? genl_family_rcv_msg+0x370/0x370
[1041819.249292] [<ffffffff816524f1>] genl_rcv_msg+0x91/0xd0
[1041819.275505] [<ffffffff81650579>] netlink_rcv_skb+0xa9/0xc0
[1041819.301493] [<ffffffff81650a78>] genl_rcv+0x28/0x40
[1041819.326611] [<ffffffff8164fb85>] netlink_unicast+0xd5/0x1b0
[1041819.351267] [<ffffffff8164ff80>] netlink_sendmsg+0x320/0x760
[1041819.375083] [<ffffffff8164ce24>] ? netlink_rcv_wake+0x44/0x60
[1041819.398608] [<ffffffff8164de82>] ? netlink_recvmsg+0x1a2/0x3a0
[1041819.421220] [<ffffffff8160a70b>] sock_sendmsg+0x8b/0xc0
[1041819.443257] [<ffffffff81199cd9>] ? mpol_misplaced+0x189/0x250
[1041819.465128] [<ffffffff8160a3fe>] ? move_addr_to_kernel.part.16+0x1e/0x60
[1041819.487162] [<ffffffff8160afc1>] ? move_addr_to_kernel+0x21/0x30
[1041819.508068] [<ffffffff8160af93>] ___sys_sendmsg+0x3c3/0x3d0
[1041819.528547] [<ffffffff8172ade4>] ? __do_page_fault+0x204/0x560
[1041819.548497] [<ffffffff810a0175>] ? set_next_entity+0x95/0xb0
[1041819.567917] [<ffffffff8160b692>] __sys_sendmsg+0x42/0x80
[1041819.586865] [<ffffffff8160b6e2>] SyS_sendmsg+0x12/0x20
[1041819.605307] [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1041819.623204] Code: 00 0f b7 d2 83 fa 01 76 5b 83 ea 02 b8 10 00 00 00 48 8d 72 02 48 c1 e6 04 eb 0b 66 90 48 83 c0 10 48 39 f0 74 3e 48 8b 54 01 08 <80> 3a 06 75 ed f6 42 03 04 74 e7 48 89 51 08 49 8b 54 24 20 8b
[1041819.677043] RIP [<ffffffffa03e3efe>] vr_nexthop_add+0xee/0xa40 [vrouter]
[1041819.694572] RSP <ffff883fd139b708>
[1041819.711435] CR2: 0000000000000000
[1041819.758469] ---[ end trace f622e76c06d15ca7 ]---

Stefan Andres (s-andres)
information type: Proprietary → Public
Pedro Marques (5-roque)
Changed in juniperopenstack:
importance: Undecided → Critical
tags: added: vrouter
Changed in juniperopenstack:
assignee: nobody → Anand H. Krishnan (anandhk)
tags: added: customer
Revision history for this message
Martin Gerhard Loschwitz (martin-loschwitz) wrote :

The version of vrouter that we use is at Git commit 63b16769ac8a78286723905157015947daf18ef0 -- just for the reference.

Revision history for this message
Anand H. Krishnan (anandhk) wrote :
Revision history for this message
Anand H. Krishnan (anandhk) wrote :

Code has changed

Revision history for this message
Anand H. Krishnan (anandhk) wrote :

Code has changed in mainline.

Duplicate of

https://bugs.launchpad.net/juniperopenstack/+bug/1393201

Changed in juniperopenstack:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.