VRouter kernel Oops

Bug #1349258 reported by Édouard Thuleau
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenContrail
Fix Committed
Undecided
Anand H. Krishnan

Bug Description

During source NAT development, I got some vrouter kernel trace.
To confirm, I ran a simple test :

$ cd /opt/stack/contrail/controller/src/config/utils
$ while true; do echo -n "creating... "; python contrail_veth_port.py --subnet 10.0.0.0/24 test_vm private; echo ok; echo -n "deleting... "; python contrail_veth_port.py --delete --subnet 10.0.0.0/24 test_vm private; echo ok; done

And 12 hours later I got that kernel trace:

44321 Jul 26 23:17:32 localhost kernel: [118557.448900] RTNL: assertion failed at /build/buildd/linux-3.2.0/net/core/dev.c (3218)
44322 Jul 26 23:17:32 localhost kernel: [118557.450256] Pid: 13339, comm: contrail-vroute Tainted: G O 3.2.0-64-virtual #97-Ubuntu
44323 Jul 26 23:17:32 localhost kernel: [118557.450259] Call Trace:
44324 Jul 26 23:17:32 localhost kernel: [118557.450272] [<ffffffff8153f8a9>] netdev_rx_handler_unregister+0x59/0x60
44325 Jul 26 23:17:32 localhost kernel: [118557.450313] [<ffffffffa0171fd0>] linux_if_del_tap+0x90/0xa0 [vrouter]
44326 Jul 26 23:17:32 localhost kernel: [118557.450320] [<ffffffffa017a21b>] eth_drv_del+0x1b/0x60 [vrouter]
44327 Jul 26 23:17:32 localhost kernel: [118557.450331] [<ffffffffa017aece>] vif_delete+0x2e/0x40 [vrouter]
44328 Jul 26 23:17:32 localhost kernel: [118557.450340] [<ffffffffa017b53c>] vr_interface_req_process+0x13c/0x250 [vrouter]
44329 Jul 26 23:17:32 localhost kernel: [118557.450345] [<ffffffffa016c29a>] sandesh_decode_one+0x10a/0x1d0 [vrouter]
44330 Jul 26 23:17:32 localhost kernel: [118557.450352] [<ffffffffa016dd30>] ? thrift_binary_protocol_write_field_begin+0x80/0x80 [vrouter]
44331 Jul 26 23:17:32 localhost kernel: [118557.450358] [<ffffffffa016cf70>] ? thrift_binary_protocol_write_message_end+0x10/0x10 [vrouter]
44332 Jul 26 23:17:32 localhost kernel: [118557.450364] [<ffffffffa016dd50>] ? thrift_binary_protocol_write_sandesh_begin+0x20/0x20 [vrouter]
44333 Jul 26 23:17:32 localhost kernel: [118557.450369] [<ffffffffa016cf60>] ? thrift_protocol_skip+0x240/0x240 [vrouter]
44334 Jul 26 23:17:32 localhost kernel: [118557.450374] [<ffffffffa016cf80>] ? thrift_binary_protocol_write_sandesh_end+0x10/0x10 [vrouter]
44335 Jul 26 23:17:32 localhost kernel: [118557.450379] [<ffffffffa016cf90>] ? thrift_binary_protocol_write_struct_begin+0x10/0x10 [vrouter]
44336 Jul 26 23:17:32 localhost kernel: [118557.450384] [<ffffffffa016dcb0>] ? thrift_binary_protocol_write_set_begin+0x10/0x10 [vrouter]
44337 Jul 26 23:17:32 localhost kernel: [118557.450389] [<ffffffffa016cfa0>] ? thrift_binary_protocol_write_struct_end+0x10/0x10 [vrouter]
44338 Jul 26 23:17:32 localhost kernel: [118557.450395] [<ffffffffa016dc80>] ? thrift_binary_protocol_write_map_begin+0xa0/0xa0 [vrouter]
44339 Jul 26 23:17:32 localhost kernel: [118557.450400] [<ffffffffa016dbe0>] ? thrift_binary_protocol_write_list_begin+0x80/0x80 [vrouter]
44340 Jul 26 23:17:32 localhost kernel: [118557.450405] [<ffffffffa016cfb0>] ? thrift_binary_protocol_write_field_end+0x10/0x10 [vrouter]
44341 Jul 26 23:17:32 localhost kernel: [118557.450410] [<ffffffffa016db60>] ? thrift_binary_protocol_write_bool+0x20/0x20 [vrouter]
44342 Jul 26 23:17:32 localhost kernel: [118557.450415] [<ffffffffa016cfc0>] ? thrift_binary_protocol_write_map_end+0x10/0x10 [vrouter]
44343 Jul 26 23:17:32 localhost kernel: [118557.450420] [<ffffffffa016dca0>] ? thrift_binary_protocol_write_field_stop+0x20/0x20 [vrouter]
44344 Jul 26 23:17:32 localhost kernel: [118557.450425] [<ffffffffa016cfd0>] ? thrift_binary_protocol_write_list_end+0x10/0x10 [vrouter]
44345 Jul 26 23:17:32 localhost kernel: [118557.450430] [<ffffffffa016db40>] ? thrift_binary_protocol_write_string+0x50/0x50 [vrouter]
44346 Jul 26 23:17:32 localhost kernel: [118557.450435] [<ffffffffa016dac0>] ? thrift_binary_protocol_write_i16+0x40/0x40 [vrouter]
44347 Jul 26 23:17:32 localhost kernel: [118557.450440] [<ffffffffa016da80>] ? thrift_binary_protocol_write_i32+0x30/0x30 [vrouter]
44348 Jul 26 23:17:32 localhost kernel: [118557.450445] [<ffffffffa016da50>] ? thrift_binary_protocol_write_i64+0x60/0x60 [vrouter]
44349 Jul 26 23:17:32 localhost kernel: [118557.450450] [<ffffffffa016d9f0>] ? thrift_binary_protocol_write_u16+0x40/0x40 [vrouter]
44350 Jul 26 23:17:32 localhost kernel: [118557.450455] [<ffffffffa016d9b0>] ? thrift_binary_protocol_write_ipv4+0x10/0x10 [vrouter]
44351 Jul 26 23:17:32 localhost kernel: [118557.450460] [<ffffffffa016d970>] ? thrift_binary_protocol_write_u64+0x60/0x60 [vrouter]
44352 Jul 26 23:17:32 localhost kernel: [118557.450465] [<ffffffffa016d910>] ? thrift_binary_protocol_write_binary+0x90/0x90 [vrouter]
44353 Jul 26 23:17:32 localhost kernel: [118557.450469] [<ffffffffa016d9a0>] ? thrift_binary_protocol_write_u32+0x30/0x30 [vrouter]
44354 Jul 26 23:17:32 localhost kernel: [118557.450474] [<ffffffffa016cfe0>] ? thrift_binary_protocol_write_set_end+0x10/0x10 [vrouter]
44355 Jul 26 23:17:32 localhost kernel: [118557.450479] [<ffffffffa016daf0>] ? thrift_binary_protocol_write_byte+0x30/0x30 [vrouter]
44356 Jul 26 23:17:32 localhost kernel: [118557.450484] [<ffffffffa016d880>] ? thrift_binary_protocol_read_message_begin+0x100/0x100 [vrouter]
44357 Jul 26 23:17:32 localhost kernel: [118557.450489] [<ffffffffa016daf0>] ? thrift_binary_protocol_write_byte+0x30/0x30 [vrouter]
44358 Jul 26 23:17:32 localhost kernel: [118557.450494] [<ffffffffa016d760>] ? thrift_binary_protocol_read_field_begin+0xa0/0xa0 [vrouter]
44359 Jul 26 23:17:32 localhost kernel: [118557.450500] [<ffffffffa016d000>] ? thrift_binary_protocol_read_message_end+0x10/0x10 [vrouter]
44360 Jul 26 23:17:32 localhost kernel: [118557.450504] [<ffffffffa016d780>] ? thrift_binary_protocol_read_sandesh_begin+0x20/0x20 [vrouter]
44361 Jul 26 23:17:32 localhost kernel: [118557.450510] [<ffffffffa016cff0>] ? thrift_binary_protocol_write_double+0x10/0x10 [vrouter]
44362 Jul 26 23:17:32 localhost kernel: [118557.450515] [<ffffffffa016d010>] ? thrift_binary_protocol_read_sandesh_end+0x10/0x10 [vrouter]
44363 Jul 26 23:17:32 localhost kernel: [118557.450520] [<ffffffffa016d030>] ? thrift_binary_protocol_read_struct_begin+0x20/0x20 [vrouter]
44364 Jul 26 23:17:32 localhost kernel: [118557.450525] [<ffffffffa016d6c0>] ? thrift_binary_protocol_read_map_begin+0xd0/0xd0 [vrouter]
44365 Jul 26 23:17:32 localhost kernel: [118557.450530] [<ffffffffa016d040>] ? thrift_binary_protocol_read_struct_end+0x10/0x10 [vrouter]
44366 Jul 26 23:17:32 localhost kernel: [118557.450535] [<ffffffffa016d5f0>] ? thrift_binary_protocol_read_list_begin+0xb0/0xb0 [vrouter]
44367 Jul 26 23:17:32 localhost kernel: [118557.450540] [<ffffffffa016d050>] ? thrift_binary_protocol_read_field_end+0x10/0x10 [vrouter]
44368 Jul 26 23:17:32 localhost kernel: [118557.450545] [<ffffffffa016d540>] ? thrift_binary_protocol_read_set_begin+0x10/0x10 [vrouter]
44369 Jul 26 23:17:32 localhost kernel: [118557.450550] [<ffffffffa016d060>] ? thrift_binary_protocol_read_map_end+0x10/0x10 [vrouter]
44370 Jul 26 23:17:32 localhost kernel: [118557.450554] [<ffffffffa016d530>] ? thrift_binary_protocol_read_string+0xd0/0xd0 [vrouter]
44371 Jul 26 23:17:32 localhost kernel: [118557.450559] [<ffffffffa016d070>] ? thrift_binary_protocol_read_list_end+0x10/0x10 [vrouter]
44372 Jul 26 23:17:32 localhost kernel: [118557.450564] [<ffffffffa016d1e0>] ? thrift_binary_protocol_read_byte+0x50/0x50 [vrouter]
44373 Jul 26 23:17:32 localhost kernel: [118557.450569] [<ffffffffa016d190>] ? thrift_binary_protocol_read_i64+0x80/0x80 [vrouter]
44374 Jul 26 23:17:32 localhost kernel: [118557.450574] [<ffffffffa016d330>] ? thrift_binary_protocol_read_i32+0x50/0x50 [vrouter]
44375 Jul 26 23:17:32 localhost kernel: [118557.450579] [<ffffffffa016d2e0>] ? thrift_binary_protocol_read_u16+0x50/0x50 [vrouter]
44376 Jul 26 23:17:32 localhost kernel: [118557.450584] [<ffffffffa016d110>] ? thrift_binary_protocol_read_u64+0x80/0x80 [vrouter]
44377 Jul 26 23:17:32 localhost kernel: [118557.450589] [<ffffffffa016d290>] ? thrift_binary_protocol_read_ipv4+0x10/0x10 [vrouter]
44378 Jul 26 23:17:32 localhost kernel: [118557.450593] [<ffffffffa016d230>] ? thrift_binary_protocol_read_bool+0x50/0x50 [vrouter]
44379 Jul 26 23:17:32 localhost kernel: [118557.450598] [<ffffffffa016d090>] ? thrift_binary_protocol_read_double+0x10/0x10 [vrouter]
44380 Jul 26 23:17:32 localhost kernel: [118557.450603] [<ffffffffa016d280>] ? thrift_binary_protocol_read_u32+0x50/0x50 [vrouter]
44381 Jul 26 23:17:32 localhost kernel: [118557.450608] [<ffffffffa016d080>] ? thrift_binary_protocol_read_set_end+0x10/0x10 [vrouter]
44382 Jul 26 23:17:32 localhost kernel: [118557.450613] [<ffffffffa016d460>] ? thrift_binary_protocol_read_binary+0xe0/0xe0 [vrouter]
44383 Jul 26 23:17:32 localhost kernel: [118557.450618] [<ffffffffa016d380>] ? thrift_binary_protocol_read_i16+0x50/0x50 [vrouter]
44384 Jul 26 23:17:32 localhost kernel: [118557.450622] [<ffffffffa016d460>] ? thrift_binary_protocol_read_binary+0xe0/0xe0 [vrouter]
44385 Jul 26 23:17:32 localhost kernel: [118557.450628] [<ffffffffa016e130>] ? thrift_transport_flush+0x20/0x20 [vrouter]
44386 Jul 26 23:17:32 localhost kernel: [118557.450634] [<ffffffffa016e140>] ? thrift_memory_buffer_is_open+0x10/0x10 [vrouter]
44387 Jul 26 23:17:32 localhost kernel: [118557.450639] [<ffffffffa016e150>] ? thrift_memory_buffer_open+0x10/0x10 [vrouter]
44388 Jul 26 23:17:32 localhost kernel: [118557.450644] [<ffffffffa016e190>] ? thrift_memory_buffer_flush+0x10/0x10 [vrouter]
44389 Jul 26 23:17:32 localhost kernel: [118557.450649] [<ffffffffa016e160>] ? thrift_memory_buffer_close+0x10/0x10 [vrouter]
44390 Jul 26 23:17:32 localhost kernel: [118557.450654] [<ffffffffa016e220>] ? thrift_memory_buffer_read+0x90/0x90 [vrouter]
44391 Jul 26 23:17:32 localhost kernel: [118557.450659] [<ffffffffa016e170>] ? thrift_memory_buffer_read_end+0x10/0x10 [vrouter]
44392 Jul 26 23:17:32 localhost kernel: [118557.450664] [<ffffffffa016e180>] ? thrift_memory_buffer_write_end+0x10/0x10 [vrouter]
44393 Jul 26 23:17:32 localhost kernel: [118557.450669] [<ffffffffa016c170>] ? sandesh_hdr_free+0x10/0x10 [vrouter]
44394 Jul 26 23:17:32 localhost kernel: [118557.450674] [<ffffffffa016c426>] sandesh_decode+0x46/0x80 [vrouter]
44395 Jul 26 23:17:32 localhost kernel: [118557.450681] [<ffffffffa01746e4>] sandesh_proto_decode+0x24/0x30 [vrouter]
44396 Jul 26 23:17:32 localhost kernel: [118557.450688] [<ffffffffa0174130>] vr_message_request+0x40/0x50 [vrouter]
44397 Jul 26 23:17:32 localhost kernel: [118557.450694] [<ffffffffa0173b1c>] netlink_trans_request+0x5c/0x210 [vrouter]
44398 Jul 26 23:17:32 localhost kernel: [118557.450699] [<ffffffff8156ea8f>] ? genl_family_find_byid+0x2f/0x60
44399 Jul 26 23:17:32 localhost kernel: [118557.450702] [<ffffffff8156ef65>] genl_rcv_msg+0x1d5/0x250
44400 Jul 26 23:17:32 localhost kernel: [118557.450705] [<ffffffff8156e0fe>] ? netlink_unicast+0x2be/0x300
44401 Jul 26 23:17:32 localhost kernel: [118557.450708] [<ffffffff8156ed90>] ? genl_rcv+0x40/0x40
44402 Jul 26 23:17:32 localhost kernel: [118557.450710] [<ffffffff8156e829>] netlink_rcv_skb+0xa9/0xd0
44403 Jul 26 23:17:32 localhost kernel: [118557.450713] [<ffffffff8156ed75>] genl_rcv+0x25/0x40
44404 Jul 26 23:17:32 localhost kernel: [118557.450716] [<ffffffff8156e0f0>] netlink_unicast+0x2b0/0x300
44405 Jul 26 23:17:32 localhost kernel: [118557.450719] [<ffffffff8153688c>] ? __alloc_skb+0x8c/0x240
44406 Jul 26 23:17:32 localhost kernel: [118557.450722] [<ffffffff8156e41e>] netlink_sendmsg+0x2de/0x390
44407 Jul 26 23:17:32 localhost kernel: [118557.450726] [<ffffffff8152c99e>] sock_sendmsg+0x10e/0x130
44408 Jul 26 23:17:32 localhost kernel: [118557.450730] [<ffffffff8105668d>] ? set_next_entity+0xad/0xd0
44409 Jul 26 23:17:32 localhost kernel: [118557.450733] [<ffffffff8105682a>] ? finish_task_switch+0x4a/0xf0
44410 Jul 26 23:17:32 localhost kernel: [118557.450737] [<ffffffff8152efc4>] ? move_addr_to_kernel+0x64/0x70
44411 Jul 26 23:17:32 localhost kernel: [118557.450740] [<ffffffff8153a4c2>] ? verify_iovec+0x52/0xd0
44412 Jul 26 23:17:32 localhost kernel: [118557.450743] [<ffffffff8152e01f>] ___sys_sendmsg+0x3bf/0x3e0
44413 Jul 26 23:17:32 localhost kernel: [118557.450752] [<ffffffff8106074e>] ? try_to_wake_up+0x18e/0x200
44414 Jul 26 23:17:32 localhost kernel: [118557.450755] [<ffffffff81060ae0>] ? wake_up_state+0x10/0x20
44415 Jul 26 23:17:32 localhost kernel: [118557.450764] [<ffffffff8109e466>] ? wake_futex+0x76/0xa0
44416 Jul 26 23:17:32 localhost kernel: [118557.450767] [<ffffffff8109f7c3>] ? futex_wake+0x113/0x130
44417 Jul 26 23:17:32 localhost kernel: [118557.450771] [<ffffffff8152ff39>] __sys_sendmsg+0x49/0x90
44418 Jul 26 23:17:32 localhost kernel: [118557.450774] [<ffffffff8152ff99>] sys_sendmsg+0x19/0x20
44419 Jul 26 23:17:32 localhost kernel: [118557.450780] [<ffffffff81665842>] system_call_fastpath+0x16/0x1b

I got that trace twice (12 hours after the first one), but the test continue to work 2 days more without any errors.
I remark another problem. 3 days after I started the test, the loop test take 30 seconds more to run (ie. 5-6 seconds at the beginning and 35 seconds 3 days after).

Tags: vrouter
description: updated
Revision history for this message
Édouard Thuleau (ethuleau) wrote :

I also need to do a quick/dirty patch to be able to run script "contrail_veth_port.py".
I attached it.

Pedro Marques (5-roque)
Changed in opencontrail:
assignee: nobody → Divakar Dharanalakota (ddivakar)
Revision history for this message
Anand H. Krishnan (anandhk) wrote :

Fixed with commit

70b50d015fd9280e6255b8314804d2998df3a15f

Review

https://review.opencontrail.org/#/c/1104/

Changed in opencontrail:
assignee: Divakar Dharanalakota (ddivakar) → Anand H. Krishnan (anandhk)
status: New → Fix Committed
Revision history for this message
Anand H. Krishnan (anandhk) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.