5.0: Agent crash in tbb::internal::IntelSchedulerTraits

Bug #1768322 reported by Vinod Nair on 2018-05-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Fix Committed
High
sangarshan p
R3.2.3.x
Fix Committed
High
sangarshan p
R5.0
Fix Committed
High
sangarshan p
Trunk
Fix Committed
High
sangarshan p

Bug Description

On a 5.0 DPDK-Mellanox Cluster, agent crashed in tbb::internal::IntelSchedulerTraits

Core in /cs-shared/bugs/1768322

Bt is as below

gdb) bt
#0 0x00007f0e920731f7 in raise () from /lib64/libc.so.6
#1 0x00007f0e920748e8 in abort () from /lib64/libc.so.6
#2 0x00007f0e9206c266 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f0e9206c312 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000012d28d8 in NhDecode(Agent const*, NextHop const*, PktInfo const*, PktFlowInfo*, PktControlInfo*, PktControlInfo*, bool, EcmpLoadBalance const&) ()
#5 0x00000000012d420b in PktFlowInfo::EgressProcess(PktInfo const*, PktControlInfo*, PktControlInfo*) ()
#6 0x00000000012d5630 in PktFlowInfo::Process(PktInfo const*, PktControlInfo*, PktControlInfo*) ()
#7 0x00000000012e9005 in FlowHandler::Run() ()
#8 0x00000000012e2dad in Proto::RunProtoHandler(ProtoHandler*) ()
#9 0x00000000012c0c71 in FlowProto::FlowEventHandler(FlowEvent*, FlowTable*) ()
#10 0x00000000012e68dd in FlowEventQueueBase::Handler(FlowEvent*) ()
#11 0x00000000012c632f in QueueTaskRunner<FlowEvent*, WorkQueue<FlowEvent*> >::RunQueue() ()
#12 0x0000000000e8c82f in TaskImpl::execute() ()
#13 0x00007f0e92c428ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
#14 0x00007f0e92c3e5b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
#15 0x00007f0e92c3dc8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
#16 0x00007f0e92c3b67f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
#17 0x00007f0e92c3b879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
#18 0x00007f0e92e5de25 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f0e9213634d in clone () from /lib64/libc.so.6
(gdb) info threads
  Id Target Id Frame
  11 Thread 0x7f0e8a70e700 (LWP 17320) 0x00007f0e92e64a9b in recv () from /lib64/libpthread.so.0
  10 Thread 0x7f0e8b311700 (LWP 17317) 0x00007f0e9211ae47 in sched_yield () from /lib64/libc.so.6
  9 Thread 0x7f0e89c0d700 (LWP 20330) 0x00007f0e92c42252 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::receive_or_steal_task(long&, bool) () from /lib64/libtbb.so.2
  8 Thread 0x7f0e8c315700 (LWP 17313) 0x00007f0e921307f9 in syscall () from /lib64/libc.so.6
  7 Thread 0x7f0e8ab0f700 (LWP 17319) 0x00007f0e921307f9 in syscall () from /lib64/libc.so.6
  6 Thread 0x7f0e8bf14700 (LWP 17314) 0x00007f0e92e6470d in read () from /lib64/libpthread.so.0
  5 Thread 0x7f0e8b712700 (LWP 17315) 0x00007f0e9211ae47 in sched_yield () from /lib64/libc.so.6
  4 Thread 0x7f0e8980c700 (LWP 14006) 0x00007f0e9211ae47 in sched_yield () from /lib64/libc.so.6
  3 Thread 0x7f0e8af10700 (LWP 17318) 0x00007f0e9211ae47 in sched_yield () from /lib64/libc.so.6
  2 Thread 0x7f0e9587b8c0 (LWP 17274) 0x00007f0e92136923 in epoll_wait () from /lib64/libc.so.6
* 1 Thread 0x7f0e8bb13700 (LWP 17316) 0x00007f0e920731f7 in raise () from /lib64/libc.so.6
(gdb)

Vinod Nair (vinodnair) on 2018-05-01
description: updated
information type: Proprietary → Public
Vinod Nair (vinodnair) on 2018-05-04
tags: added: sanityblocker
sangarshan p (sangarshp) wrote :

fix ready , unit testing is in progress ETA:28/06

Review in progress for https://review.opencontrail.org/44190
Submitter: sangarshan p (<email address hidden>)

Review in progress for https://review.opencontrail.org/44368
Submitter: sangarshan p (<email address hidden>)

Review in progress for https://review.opencontrail.org/44445
Submitter: sangarshan p (<email address hidden>)

Review in progress for https://review.opencontrail.org/44368
Submitter: sangarshan p (<email address hidden>)

Reviewed: https://review.opencontrail.org/44368
Committed: http://github.com/Juniper/contrail-controller/commit/e5df8e0a1c8c59a487f878942802d3eb36a323d2
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit e5df8e0a1c8c59a487f878942802d3eb36a323d2
Author: sangarshp <email address hidden>
Date: Mon Jul 9 08:34:36 2018 +0530

Check Gen id for processing recompute events for flows

it is possible that enqueued forward flow become reverse flow when
flows get evicted from vrouter and traffic is received for reverse flow,

made changes to pass gen_id also when flow is enquwqued for recompute
and when it gets processed for recompute , check whether gen id present
in the event and gen id of flow matches, if it does not match,
ignore the event.

Change-Id: Ib647a157ecd852a3520a90ffba5f392ae3b33e1e
Closes-Bug: #1768322

Reviewed: https://review.opencontrail.org/44445
Committed: http://github.com/Juniper/contrail-controller/commit/4e86c5268312c9471621d3b0850d6db63e10d5e5
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 4e86c5268312c9471621d3b0850d6db63e10d5e5
Author: sangarshp <email address hidden>
Date: Sun Jul 8 18:24:35 2018 +0530

Check Gen id for processing recompute events for flows

it is possible that enqueued forward flow become reverse flow when
flows get evicted from vrouter and traffic is received for reverse flow,

made changes to pass gen_id also when flow is enqueued for recompute
and when it gets processed for recompute , check whether gen id present
in the event and gen id of flow matches, if it does not match,
ignore the event.

Change-Id: Ib647a157ecd852a3520a90ffba5f392ae3b33e1e
Closes-Bug: #1768322

Review in progress for https://review.opencontrail.org/44803
Submitter: sangarshan p (<email address hidden>)

Review in progress for https://review.opencontrail.org/44843
Submitter: sangarshan p (<email address hidden>)

Reviewed: https://review.opencontrail.org/44843
Committed: http://github.com/Juniper/contrail-controller/commit/c065431169a0e3c9bdfc19d1d9ab60547bee1c11
Submitter: Zuul (<email address hidden>)
Branch: R3.2.3.x

commit c065431169a0e3c9bdfc19d1d9ab60547bee1c11
Author: sangarshp <email address hidden>
Date: Mon Jul 9 08:34:36 2018 +0530

Check Gen id for processing recompute events for flows

it is possible that enqueued forward flow become reverse flow when
flows get evicted from vrouter and traffic is received for reverse flow,

made changes to pass gen_id also when flow is enquwqued for recompute
and when it gets processed for recompute , check whether gen id present
in the event and gen id of flow matches, if it does not match,
ignore the event.

Change-Id: Ib647a157ecd852a3520a90ffba5f392ae3b33e1e
Closes-Bug: #1768322
(cherry picked from commit e5df8e0a1c8c59a487f878942802d3eb36a323d2)

Reviewed: https://review.opencontrail.org/44803
Committed: http://github.com/Juniper/contrail-controller/commit/5443d37614df851853cc4d9e090974f6893ae96b
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 5443d37614df851853cc4d9e090974f6893ae96b
Author: sangarshp <email address hidden>
Date: Mon Jul 9 08:34:36 2018 +0530

Check Gen id for processing recompute events for flows

it is possible that enqueued forward flow become reverse flow when
flows get evicted from vrouter and traffic is received for reverse flow,

made changes to pass gen_id also when flow is enquwqued for recompute
and when it gets processed for recompute , check whether gen id present
in the event and gen id of flow matches, if it does not match,
ignore the event.

Change-Id: Ib647a157ecd852a3520a90ffba5f392ae3b33e1e
Closes-Bug: #1768322
(cherry picked from commit e5df8e0a1c8c59a487f878942802d3eb36a323d2)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers