[Ubuntu 12.04 R2.1 Icehouse Build 27] Multiple VMI in Same Logical Interface , deleting one VMI is causing VRF delete

Bug #1420903 reported by chhandak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Prabhjot Singh Sethi
R2.1
Fix Committed
High
Prabhjot Singh Sethi

Bug Description

When bare metal server is having multiple VM running and they all connected to same logical interface , in this scenario while deleting one VMI is causing the corresponding VRF to get deleted from tor agent node. Still others VMI is associated with that logical interface and belongs to deleted vrf .

Eventually leading to tor agent crash

Backtrace for tor agent crash
(gdb) bt
#0 0x00007f138faf1bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f138faf4fc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f138faeaa76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f138faeab22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x000000000088d79d in VrfEntry::DeleteTimeout (this=<optimized out>)
    at controller/src/vnsw/agent/oper/vrf.cc:314
#5 0x0000000000d47a49 in operator() (this=<optimized out>)
    at /usr/include/boost/function/function_template.hpp:767
#6 Timer::TimerTask::Run (this=0x237ea80) at controller/src/base/timer.cc:42
#7 0x0000000000d3f290 in TaskImpl::execute (this=0x7f138936f640) at controller/src/base/task.cc:232
#8 0x00007f1390cf9b3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f1390cf5816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f1390cf4f4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f1390cf10ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f1390cf12f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f1390f15182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007f138fbb5fbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

chhandak (chhandak)
no longer affects: juniperopenstack/trunk
Nischal Sheth (nsheth)
tags: added: contrail-control
Changed in juniperopenstack:
importance: Undecided → High
tags: added: blocker
information type: Proprietary → Public
Changed in juniperopenstack:
assignee: nobody → Prabhjot Singh Sethi (prabhjot)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/7353
Committed: http://github.org/Juniper/contrail-controller/commit/fed0d74f972857fd1e9d41f1ea33794b81631fab
Submitter: Zuul
Branch: R2.1

commit fed0d74f972857fd1e9d41f1ea33794b81631fab
Author: Prabhjot Singh Sethi <email address hidden>
Date: Thu Feb 12 00:08:47 2015 -0800

Fix Tor-Agent crash for VRF Delete timeout

Issue:
------
when a logical interface is associated with multiple VMIs
and administrator removes one VMI from the list associated
it results in deletion of existing VRF and re-addition
again. By this time if there are any routes imported from
TOR, they hold the reference to VRF and doesnot allow VRF
to clean up, which results in a delete timeout.
In usual scenarios VRF is deleted along with VN, so we
didnot observe this issue earlier.

Fix:
----
Maintain a VRF dependency list in UnicastMacLocalOvsdb
table and trigger re-eval of entries on VRF delete.
which removes the OvsdbEntry for unicast mac local and
re-adds resulting in removing the VRF reference and move
the entry to add-defer state to wait for new VRF object.

This trigger of re-eval is triggered in a work queue to
ensure the order of events such that re-eval kicks in after
the vn_ovsdb_entry becomes in-active to hold the
re-addition of route in Add defer state.

Also adding fix for marking a logical entry as incomplete
if logical interface to vmi association is removed to allow
completion of logical switch delete.

Change-Id: Ifdcd06266b1eeefb36a4fcd90f3f0d0e2e471527
Closes-Bug: 1420903

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/7366
Committed: http://github.org/Juniper/contrail-controller/commit/77ce1788beedc2b6334fd7b75adf442a2f33f680
Submitter: Zuul
Branch: master

commit 77ce1788beedc2b6334fd7b75adf442a2f33f680
Author: Prabhjot Singh Sethi <email address hidden>
Date: Thu Feb 12 00:08:47 2015 -0800

Fix Tor-Agent crash for VRF Delete timeout

Issue:
------
when a logical interface is associated with multiple VMIs
and administrator removes one VMI from the list associated
it results in deletion of existing VRF and re-addition
again. By this time if there are any routes imported from
TOR, they hold the reference to VRF and doesnot allow VRF
to clean up, which results in a delete timeout.
In usual scenarios VRF is deleted along with VN, so we
didnot observe this issue earlier.

Fix:
----
Maintain a VRF dependency list in UnicastMacLocalOvsdb
table and trigger re-eval of entries on VRF delete.
which removes the OvsdbEntry for unicast mac local and
re-adds resulting in removing the VRF reference and move
the entry to add-defer state to wait for new VRF object.

This trigger of re-eval is triggered in a work queue to
ensure the order of events such that re-eval kicks in after
the vn_ovsdb_entry becomes in-active to hold the
re-addition of route in Add defer state.

Also adding fix for marking a logical entry as incomplete
if logical interface to vmi association is removed to allow
completion of logical switch delete.

Closes-Bug: 1420903
(cherry picked from commit fed0d74f972857fd1e9d41f1ea33794b81631fab)

Change-Id: Icfe97976f2211d83275edad9da893222ea9d8abf

Changed in juniperopenstack:
status: New → Fix Committed
Nischal Sheth (nsheth)
tags: removed: contrail-control
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.