DPDK: Agent crash in VnUveEntry::UpdateInterVnStats

Bug #1569645 reported by Vinod Nair
34
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Critical
Ashok Singh
Trunk
Fix Committed
Critical
Ashok Singh

Bug Description

Agent crash VnUveEntry::UpdateInterVnStats

Back trace is as below

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f3228114e3c in std::string::compare(std::string const&) const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
#0 0x00007f3228114e3c in std::string::compare(std::string const&) const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x0000000000c60d48 in VnUveEntry::UpdateInterVnStats(std::string const&, unsigned long, unsigned long, bool) ()
#2 0x0000000000ba6026 in FlowStatsCollector::UpdateInterVnStats(FlowExportInfo*, unsigned long, unsigned long) ()
#3 0x0000000000ba625d in FlowStatsCollector::UpdateFlowStatsInternal(FlowExportInfo*, unsigned int, unsigned short, unsigned int, unsigned short, unsigned long, bool, unsigned long*, unsigned long*) ()
#4 0x0000000000ba76be in FlowStatsCollector::UpdateAndExportInternal(FlowExportInfo*, unsigned int, unsigned short, unsigned int, unsigned short, unsigned long, bool, RevFlowDepParams const*)
    ()
#5 0x0000000000ba77f8 in FlowStatsCollector::UpdateAndExportInternalLocked(FlowExportInfo*, unsigned int, unsigned short, unsigned int, unsigned short, unsigned long, bool, RevFlowDepParams const*) ()
#6 0x0000000000ba9207 in FlowStatsCollector::Run() ()
#7 0x0000000000bab11a in StatsCollector::TimerExpiry() ()
#8 0x000000000118db39 in Timer::TimerTask::Run() ()
#9 0x0000000001186b3c in TaskImpl::execute() ()
#10 0x00007f322837fb3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f322837b816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f322837af4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f32283770ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f32283772f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f322859b182 in start_thread (arg=0x7f321b3fc700) at pthread_create.c:312
#16 0x00007f322787447d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Build:3.0.2.0-26~kilo
Core:

Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Ashok Singh (ashoksr)
Revision history for this message
Ashok Singh (ashoksr) wrote :

Happens because of parallel access between kTaskFlowStatsCollector and kTaskDBExclude

Changed in juniperopenstack:
status: New → Triaged
Jeba Paulaiyan (jebap)
Changed in juniperopenstack:
importance: High → Critical
milestone: none → r3.1.0.0-fcs
information type: Proprietary → Public
tags: added: blocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19351
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19351
Committed: http://github.org/Juniper/contrail-controller/commit/7fbb3cee261bea0c11f31bbedff9c1497dc1a6f9
Submitter: Zuul
Branch: R3.0

commit 7fbb3cee261bea0c11f31bbedff9c1497dc1a6f9
Author: ashoksingh <email address hidden>
Date: Sat Apr 16 09:23:18 2016 +0530

Fix Agent crash because of parallel access between kTaskFlowStatsCollector and
kTaskDBExclude

When kTaskFlowStatsCollector was updating stats in VnUveEntry, kTaskDBExclude
went ahead and deleted the VnUveEntry. There are other such structures for which
parallel access between kTaskFlowStatsCollector and kTaskDBExclude can cause
issues

Fixed all parallel access between kTaskFlowStatsCollector and kTaskDBExclude by
acquiring required locks. Also added task exclusion between kTaskDBExclude and
Agent::Uve to prevent simultaneous access of UVE data-structures between
kTaskDBExclude and Agent::Uve. The code executed under kTaskDBExclude was
earlier executed in db::DBTable which had exclusion with Agent::Uve.

Also added locks to prevent simultaneous access between kTaskFlowStatsCollector
and Agent::Uve.
Closes-Bug: #1569645

Change-Id: I416799a823abe187a32b0bdf4c04ac5996a8144b

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/19360
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19360
Committed: http://github.org/Juniper/contrail-controller/commit/4de3a4e025b44445eff373e5233d18e4c5c4e9ba
Submitter: Zuul
Branch: master

commit 4de3a4e025b44445eff373e5233d18e4c5c4e9ba
Author: ashoksingh <email address hidden>
Date: Sat Apr 16 09:23:18 2016 +0530

Fix Agent crash because of parallel access between kTaskFlowStatsCollector and
kTaskDBExclude

When kTaskFlowStatsCollector was updating stats in VnUveEntry, kTaskDBExclude
went ahead and deleted the VnUveEntry. There are other such structures for which
parallel access between kTaskFlowStatsCollector and kTaskDBExclude can cause
issues

Fixed all parallel access between kTaskFlowStatsCollector and kTaskDBExclude by
acquiring required locks. Also added task exclusion between kTaskDBExclude and
Agent::Uve to prevent simultaneous access of UVE data-structures between
kTaskDBExclude and Agent::Uve. The code executed under kTaskDBExclude was
earlier executed in db::DBTable which had exclusion with Agent::Uve.

Also added locks to prevent simultaneous access between kTaskFlowStatsCollector
and Agent::Uve.
Closes-Bug: #1569645

(cherry picked from commit 7fbb3cee261bea0c11f31bbedff9c1497dc1a6f9)

Change-Id: I744691ffe29c8121289bae91f53009f3acc36786

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.