Compute node kernel panic

Bug #1697264 reported by Ruslan Usichenko
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenContrail
New
Undecided
Unassigned

Bug Description

Hi Team,

I encountered with an issue when i ping another VM that's located on different compute node or public host with bigger packet size than mtu of interface (ping -s 2000 8.8.8.8). Compute node crashes with kernel panic. I observe weird behavior only with linux kernel 4.x and 3.19, kernel 3.13 is not affected. I've attached kernel crashdump and kernel traceback.

OpenContrail version: 3.1.1
OpenStack release: mitaka
Linux distr: ubuntu 16.04 hwe
Linux kernel: 4.8.0-54-generic

vrouter module info

root@cmp001:/# modinfo vrouter
filename: /lib/modules/4.8.0-54-generic/updates/dkms/vrouter.ko
version: 1.0
license: GPL
srcversion: 615F009C28F6CDD7B3DAE9E
depends:
vermagic: 4.8.0-54-generic SMP mod_unload modversions
parm: vr_flow_entries:uint
parm: vr_oflow_entries:uint
parm: vr_bridge_entries:uint
parm: vr_bridge_oentries:uint
parm: vr_mpls_labels:uint
parm: vr_nexthops:uint
parm: vr_vrfs:uint
parm: vr_flow_hold_limit:uint
parm: vr_interfaces:uint
parm: vrouter_dbg:Set 1 for pkt dumping and 0 to disable, default value is 0 (int)

kernel traceback

[ 144.309486] BUG: unable to handle kernel paging request at ffff94ef80000000
[ 144.309705] IP: [<ffffffffa2a3d302>] __memcpy+0x12/0x20
[ 144.309863] PGD faec3c067 PUD 0
[ 144.310062] Oops: 0000 [#1] SMP
[ 144.310155] Modules linked in: veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables xfs binfmt_misc vrouter(OE) intel_rapl sb_edac bridge 8021q ipmi_ssif garp mrp stp edac_core llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass joydev intel_cstate input_leds intel_rapl_perf ipmi_si ipmi_msghandler mac_hid shpchp acpi_power_meter lpc_ich nf_conntrack ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net vhost macvtap macvlan autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
[ 144.314736] xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd megaraid_sas fnic libfcoe igb usbhid libfc hid dca ptp scsi_transport_fc enic pps_core i2c_algo_bit wmi fjes
[ 144.316721] CPU: 7 PID: 10130 Comm: kworker/7:3 Tainted: G OE 4.8.0-54-generic #57~16.04.1-Ubuntu
[ 144.316860] Hardware name: Cisco Systems Inc N1K-1110-X/UCSC-C220-M3S, BIOS C220M3.2.0.3.0.080120140402 08/01/2014
[ 144.317012] Workqueue: events lh_work [vrouter]
[ 144.317161] task: ffff94e749680000 task.stack: ffff94e1fd404000
[ 144.317261] RIP: 0010:[<ffffffffa2a3d302>] [<ffffffffa2a3d302>] __memcpy+0x12/0x20
[ 144.317446] RSP: 0018:ffff94e1fd4079a8 EFLAGS: 00010206
[ 144.317543] RAX: ffff94ed933c7c1c RBX: ffff94e1ff318700 RCX: 000000001973ac4c
[ 144.317646] RDX: 0000000000000006 RSI: ffff94ef7ffffffc RDI: ffff94edc71719fc
[ 144.317750] RBP: ffff94e1fd407a90 R08: 00000000000000c0 R09: ffff94e75f807340
[ 144.317853] R10: 0000000000000062 R11: ffff94ee92b47c00 R12: 0000000000000588
[ 144.317957] R13: 000000000000005e R14: ffff94e7561e8100 R15: ffff94ef4b9d62f0
[ 144.318060] FS: 0000000000000000(0000) GS:ffff94e75fdc0000(0000) knlGS:0000000000000000
[ 144.318193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 144.318292] CR2: ffff94ef80000000 CR3: 00000008587ee000 CR4: 00000000000426e0
[ 144.318395] Stack:
[ 144.318482] ffffffffa2d6ea81 ffffffffa2d7dc6e 0000000000000046 ffffffffffffffac
[ 144.318828] ffff94e7561e8100 fffffff400000000 000000000088000e 00000000ff780046
[ 144.319220] 0001ffff000000b2 000000000000000e 0000000000000054 ffff94e1ff318700
[ 144.319565] Call Trace:
[ 144.319657] [<ffffffffa2d6ea81>] ? skb_segment+0x351/0xbc0
[ 144.319756] [<ffffffffa2d7dc6e>] ? dev_forward_skb+0x1e/0x30
[ 144.319862] [<ffffffffc0716b6a>] linux_xmit+0x16a/0x2e0 [vrouter]
[ 144.319966] [<ffffffffc07165c3>] linux_xmit_segment+0x53/0x420 [vrouter]
[ 144.320072] [<ffffffffc0716dc6>] linux_if_tx+0xe6/0x390 [vrouter]
[ 144.320178] [<ffffffffc073011e>] ? vr_htable_get_hentry_by_index+0xe/0x30 [vrouter]
[ 144.320329] [<ffffffffc07234d2>] eth_tx+0x92/0x200 [vrouter]
[ 144.320437] [<ffffffffc07226ae>] ? vif_cmn_rewrite.part.8+0x3e/0x90 [vrouter]
[ 144.320572] [<ffffffffc071c76e>] nh_vxlan_tunnel+0x11e/0x230 [vrouter]
[ 144.320679] [<ffffffffc071db02>] nh_output+0x42/0xd0 [vrouter]
[ 144.320783] [<ffffffffc072fb83>] vr_bridge_input+0x2c3/0x560 [vrouter]
[ 144.320886] [<ffffffffa29fc5c3>] ? generic_make_request+0x33/0x1d0
[ 144.320990] [<ffffffffa26b8183>] ? update_curr+0xf3/0x180
[ 144.321093] [<ffffffffc0720aac>] vr_reinject_packet+0x7c/0xb0 [vrouter]
[ 144.321199] [<ffffffffc072b6ef>] vr_flow_flush_pnode+0xcf/0x160 [vrouter]
[ 144.321306] [<ffffffffc072b995>] vr_flush_entry.isra.34+0x75/0xa0 [vrouter]
[ 144.321413] [<ffffffffc072ba8c>] vr_flow_work+0xcc/0x1c0 [vrouter]
[ 144.321518] [<ffffffffc0712734>] lh_work+0x14/0x20 [vrouter]
[ 144.321618] [<ffffffffa269d89b>] process_one_work+0x16b/0x4a0
[ 144.321718] [<ffffffffa269dc1b>] worker_thread+0x4b/0x500
[ 144.321816] [<ffffffffa269dbd0>] ? process_one_work+0x4a0/0x4a0
[ 144.321916] [<ffffffffa269dbd0>] ? process_one_work+0x4a0/0x4a0
[ 144.322017] [<ffffffffa26a3fb8>] kthread+0xd8/0xf0
[ 144.322114] [<ffffffffa2e9aa9f>] ret_from_fork+0x1f/0x40
[ 144.322213] [<ffffffffa26a3ee0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 144.322314] Code: 21 2f a3 e8 51 1a ca ff 0f 31 48 c1 e2 20 48 09 d0 48 31 c3 e9 6d ff ff ff 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
[ 144.326057] RIP [<ffffffffa2a3d302>] __memcpy+0x12/0x20
[ 144.326209] RSP <ffff94e1fd4079a8>
[ 144.326300] CR2: ffff94ef80000000

Tags: vrouter
Revision history for this message
Ruslan Usichenko (rusichenko) wrote :
Revision history for this message
Ruslan Usichenko (rusichenko) wrote :
Revision history for this message
Jakub Pavlik (pavlk-jakub) wrote :

it is weird, because see my test bellow. It is working. Do you have enabled hugepages or anything specific?

$ sudo ip link set mtu 1400 dev eth0
$ ping -s 1410 192.168.50.3
PING 192.168.50.3 (192.168.50.3): 1410 data bytes
1418 bytes from 192.168.50.3: seq=0 ttl=64 time=1.683 ms
1418 bytes from 192.168.50.3: seq=1 ttl=64 time=0.458 ms
1418 bytes from 192.168.50.3: seq=2 ttl=64 time=0.627 ms
1418 bytes from 192.168.50.3: seq=3 ttl=64 time=0.626 ms
1418 bytes from 192.168.50.3: seq=4 ttl=64 time=0.603 ms
1418 bytes from 192.168.50.3: seq=5 ttl=64 time=0.619 ms
1418 bytes from 192.168.50.3: seq=6 ttl=64 time=0.612 ms
^C
--- 192.168.50.3 ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss
round-trip min/avg/max = 0.458/0.746/1.683 ms
$ exit
Connection to 169.254.0.3 closed.
root@cmp001:~# uname -a
Linux cmp001 4.8.0-54-lowlatency #57~16.04.1-Ubuntu SMP PREEMPT Wed May 24 18:53:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Ruslan Usichenko (rusichenko) wrote :

Hi,

Do you have enabled hugepages or anything specific? - No, i don't have, also i played with disabling offloading on nic, it doesn't help.

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.