SYMC:Rapid network create and delete causes a race condition

Bug #1482277 reported by Rudrajit Tapadar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
High
Prakash Bailkeri
R2.20
Fix Committed
High
Prakash Bailkeri
Trunk
Fix Committed
High
Prakash Bailkeri
OpenContrail
Fix Released
High
Prakash Bailkeri

Bug Description

Rapid network create and delete causes a race condition of state update in the controllers. The impact is that some controllers have the correct state and routes when the network is re-created, while the affected controllers don't have any routes for that particular network and is in a "deleted=True" state.

Revision history for this message
Prakash Bailkeri (prakashmb) wrote :
Download full text (11.2 KiB)

gcore and binary in http://mayamruga/Docs/bugs/1482277/

gcore analysis:

(gdb) pdb_table_listeners 0x7f3ef0aaf970
 DBtable 0x7f3ef0aaf970(DLP:dlp-AutomationFramework:snat-si-left_si_73d9d015-7f4e-4426-8b9f-7f9a5421cd39:service-3b6023db-a05b-4770-8cd1-5b765466a8d2-DLP_dlp-AutomationFramework_si_73d9d015-7f4e-4426-8b9f-7f9a5421cd39.inet.0) has following clients
-----------------------------------------------------
  ListenerId Callback
-----------------------------------------------------
(gdb) pdb_table_entries 0x7f3ef0aaf970
  Entries in DB Table 0x7f3ef0aaf970
--------------------------------------------
   Entry ptr Entry flags
--------------------------------------------
  0x7f3ef0d2f430 0x00000002

(gdb) pdb_entry_states 0x7f3ef0d2f430
  DBEntry 0x7f3ef0d2f430 has following states
-----------------------------------------------------
    ListenerId DBState ptr
-----------------------------------------------------
         0 0x7f3ef994f0e0

(gdb) p *(DBState *) 0x7f3ef994f0e0
$64 = (RtReplicated) {
  <DBState> = {
    _vptr.DBState = 0xa9a170 <vtable for RtReplicated+16>
  },
  members of RtReplicated:
  replicate_list_ = std::set with 6 elements = {
    [0] = {
      table_ = 0x2b992a0,
      peer_ = 0x0,
      path_id_ = 171086486,
      src_ = BgpPath::StaticRoute,
      rt_ = 0x7f3ef0cb45b0
    },
    [1] = {
      table_ = 0x7f3ece943670,
      peer_ = 0x0,
      path_id_ = 171086486,
      src_ = BgpPath::StaticRoute,
      rt_ = 0x7f3ef0ccc870
    },
    [2] = {
      table_ = 0x7f3ef081eec0,
      peer_ = 0x0,
      path_id_ = 171086486,
      src_ = BgpPath::StaticRoute,
      rt_ = 0x7f3ef176c8e0
    },
    [3] = {
      table_ = 0x7f3ef0829db0,
      peer_ = 0x0,
      path_id_ = 171086486,
      src_ = BgpPath::StaticRoute,
      rt_ = 0x7f3ef0cccb50
    },
    [4] = {
      table_ = 0x7f3ef089d600,
      peer_ = 0x0,
      path_id_ = 171086486,
      src_ = BgpPath::StaticRoute,
      rt_ = 0x7f3ef0ccc500
    },
    [5] = {
      table_ = 0x7f3ef45dc590,
      peer_ = 0x0,
      path_id_ = 171086486,
      src_ = BgpPath::StaticRoute,
      rt_ = 0x7f3ef10c0a10
    }
  }
}

p ((DBTable *) 0x2b992a0)->name_
$113 = "bgp.l3vpn.0"
deleted_ = False
p ((DBTable *) 0x7f3ece943670)->name_
$114 = "DLP:dlp-AutomationFramework:dlp-AutomationFramework-infra-net:dlp-AutomationFramework-infra-net.inet.0"
deleted_ = False
p ((DBTable *) 0x7f3ef081eec0)->name_
$115 = "DLP:dlp-AutomationFramework:ASH2-postfix-net:ASH2-postfix-net.inet.0"
deleted_ = True
p ((DBTable *) 0x7f3ef0829db0)->name_
$116 = "DLP:dlp-AutomationFramework:ASH2-csg-net:ASH2-csg-net.inet.0"
deleted_ = True
p ((DBTable *) 0x7f3ef089d600)->name_
$117 = "DLP:dlp-AutomationFramework:ASH2-cds-c3b3e709-8d5c-4438-9148-3a5844f73ded-net:ASH2-cds-c3b3e709-8d5c-4438-9148-3a5844f73ded-net.inet.0"
deleted_ = True
p ((DBTable *) 0x7f3ef45dc590)->name_
$118 = "DLP:dlp-AutomationFramework:ASH2-cds-5a000f99-02e5-409f-80e7-919fd1c12be7-net:ASH2-cds-5a000f99-02e5-409f-80e7-919fd1c12be7-net.inet.0"
deleted_ = True

(gdb) p *(InetRoute *) 0x7f3ef176c8e0
$122 = (InetRoute) {
  <BgpRoute> = {
    <Route> = {
      <DBEntry> = {
        <DBEntryBase> = {
 ...

Revision history for this message
Prakash Bailkeri (prakashmb) wrote :

Root cause:
When the routing instance is deleted, route path replicator walks the route table as part of Leave of rtgroup in import and export Route targets. During the walk, it deletes all the replicated path/routes. On walk complete, it unregisters from the DBTable.
In the static route scenario, the static route is added on the internal routing instance that has "static-route-entries" property attached to it. Such generated static route is replicated to destination VRF based on "route-target-list" config in "static-route-entries. Note: such internal routing instance doesn't have these route targets in its export_rt.

If this internal routing instance is deleted, route path replicator starts the table walk as part of Leave of RtGroup in import and export. In case static route module has not processed the config delete of static route entries, it would not have deleted the static route added to inet route table. On walk complete, it would unregister from the routing table. Hence the replicated routes of Static route will never be deleted as RouteReplicator module is no longer TableListener and doesn't process delete of StaticRoute which happens when static route module process the delete request.

Proposed fix: (Testing)
Routereplicator to keep track of replicated routes/dbstate added, and unregister from DBTable only after all DBStates are clear (or after all replicated routes from this table is deleted).

Changed in opencontrail:
assignee: nobody → Prakash Bailkeri (prakashmb)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.0

Review in progress for https://review.opencontrail.org/12924
Submitter: Prakash Bailkeri (<email address hidden>)

tags: added: contrail-control
Changed in opencontrail:
importance: Undecided → High
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Prakash Bailkeri (prakashmb)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/12993
Submitter: Prakash Bailkeri (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/12994
Submitter: Prakash Bailkeri (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/12993
Submitter: Prakash Bailkeri (<email address hidden>)

summary: - Rapid network create and delete causes a race condition
+ SYMC:Rapid network create and delete causes a race condition
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12993
Committed: http://github.org/Juniper/contrail-controller/commit/0fabac36a6e09b8d4ffde6c5e28e94a073dd2bf6
Submitter: Zuul
Branch: R2.20

commit 0fabac36a6e09b8d4ffde6c5e28e94a073dd2bf6
Author: Prakash M Bailkeri <email address hidden>
Date: Wed Aug 12 12:46:59 2015 +0530

Routepath Replicator should unregister from BgpTable only after all replicated routes are deleted

When the routing instance is deleted, route path replicator walks the route
table as part of Leave of rtgroup in import and export Route targets.
During the walk, it deletes all the replicated path/routes.
On walk complete, it unregisters from the DBTable. In the static route scenario,
the static route is added on the internal routing instance that has
"static-route-entries" property attached to it.
Such generated static route is replicated to destination VRF based on
"route-target-list" config in "static-route-entries.

Note: such internal routing instance doesn't have these route targets in its export_rt.

If this internal routing instance is deleted, route path replicator starts the
table walk as part of Leave of RtGroup in import and export. In case static
route module has not processed the config delete of static route entries,
it would not have deleted the static route added to inet route table.
On walk complete, it would unregister from the routing table.
Hence the replicated routes of Static route will never be deleted as
RouteReplicator module is no longer TableListener and doesn't process delete of
StaticRoute which happens when static route module process the delete request.

Proposed fix:
1. Implement LifeTimeActor in TableState to manage the unregister of listener
and delete of tableState object. TableState object takes the delete reference
to the BgpTable and attempt to delete is only made if BgpTable is deleted.
TableState can be deleted if GroupList is empty & Replicated route count is zero
and table doesn't have pending tableWalks

2. Implement GetDBStateCount() api in DBTableBase class to fetch DBState
count for a given listener

Added unit test code to Static route test and route replication test to
simulate the error condition and validate the fix

Change-Id: I9eb2b94aef9e112e29dea73fde5d38808e3b18b0
Closes-bug: #1482277

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/12924
Committed: http://github.org/Juniper/contrail-controller/commit/a0d84422df11f7ba7737c9fa1f5835b4b431ade9
Submitter: Zuul
Branch: R2.0

commit a0d84422df11f7ba7737c9fa1f5835b4b431ade9
Author: Prakash M Bailkeri <email address hidden>
Date: Fri Aug 7 16:18:14 2015 +0530

Routepath Replicator should unregister from BgpTable only after all replicated routes are deleted

When the routing instance is deleted, route path replicator walks the route
table as part of Leave of rtgroup in import and export Route targets.
During the walk, it deletes all the replicated path/routes.
On walk complete, it unregisters from the DBTable. In the static route scenario,
the static route is added on the internal routing instance that has
"static-route-entries" property attached to it.
Such generated static route is replicated to destination VRF based on
"route-target-list" config in "static-route-entries.

Note: such internal routing instance doesn't have these route targets in its export_rt.

If this internal routing instance is deleted, route path replicator starts the
table walk as part of Leave of RtGroup in import and export. In case static
route module has not processed the config delete of static route entries,
it would not have deleted the static route added to inet route table.
On walk complete, it would unregister from the routing table.
Hence the replicated routes of Static route will never be deleted as
RouteReplicator module is no longer TableListener and doesn't process delete of
StaticRoute which happens when static route module process the delete request.

Proposed fix:
Routereplicator to keep track of replicated routes/dbstate added, and unregister
from DBTable only after all DBStates are clear (or after all replicated routes
from this table is deleted).

Added unit test code to Static route test and route replication test to
simulate the error condition and validate the fix

Change-Id: I5ddb05425401a36bf117a7971a3ab7758494d39b
Closes-bug: #1482277

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13052
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13056
Submitter: Prakash Bailkeri (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13052
Committed: http://github.org/Juniper/contrail-controller/commit/2157d4a889f999596e22552669ad9d63561568e1
Submitter: Zuul
Branch: R2.20

commit 2157d4a889f999596e22552669ad9d63561568e1
Author: Nischal Sheth <email address hidden>
Date: Thu Aug 13 17:56:17 2015 -0700

Add another test for replicator TableState deletion

Change-Id: Id07dd046e4ab10463cca83cbf7969c36a15e98c3
Closes-Bug: 1482277

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13064
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/12994
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12994
Committed: http://github.org/Juniper/contrail-controller/commit/04b0b74070db26fdf03eff62ce186b5bfd5fe567
Submitter: Zuul
Branch: master

commit 04b0b74070db26fdf03eff62ce186b5bfd5fe567
Author: Prakash M Bailkeri <email address hidden>
Date: Wed Aug 12 12:46:59 2015 +0530

Routepath Replicator should unregister from BgpTable only after all replicated routes are deleted

When the routing instance is deleted, route path replicator walks the route
table as part of Leave of rtgroup in import and export Route targets.
During the walk, it deletes all the replicated path/routes.
On walk complete, it unregisters from the DBTable. In the static route scenario,
the static route is added on the internal routing instance that has
"static-route-entries" property attached to it.
Such generated static route is replicated to destination VRF based on
"route-target-list" config in "static-route-entries.

Note: such internal routing instance doesn't have these route targets in its export_rt.

If this internal routing instance is deleted, route path replicator starts the
table walk as part of Leave of RtGroup in import and export. In case static
route module has not processed the config delete of static route entries,
it would not have deleted the static route added to inet route table.
On walk complete, it would unregister from the routing table.
Hence the replicated routes of Static route will never be deleted as
RouteReplicator module is no longer TableListener and doesn't process delete of
StaticRoute which happens when static route module process the delete request.

Proposed fix:
1. Implement LifeTimeActor in TableState to manage the unregister of listener
and delete of tableState object. TableState object takes the delete reference
to the BgpTable and attempt to delete is only made if BgpTable is deleted.
TableState can be deleted if GroupList is empty & Replicated route count is zero
and table doesn't have pending tableWalks

2. Implement GetDBStateCount() api in DBTableBase class to fetch DBState
count for a given listener

Added unit test code to Static route test and route replication test to
simulate the error condition and validate the fix

Change-Id: I9eb2b94aef9e112e29dea73fde5d38808e3b18b0
Closes-bug: #1482277
(cherry picked from commit c7a87367e1eea173c1b73ac599e92a6705197845)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13064
Submitter: Nischal Sheth (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13064
Committed: http://github.org/Juniper/contrail-controller/commit/1a7ca5f7bb2fc1a7e0fdb9787da146e8eaa0b80d
Submitter: Zuul
Branch: master

commit 1a7ca5f7bb2fc1a7e0fdb9787da146e8eaa0b80d
Author: Nischal Sheth <email address hidden>
Date: Thu Aug 13 17:56:17 2015 -0700

Add another test for replicator TableState deletion

Change-Id: Id07dd046e4ab10463cca83cbf7969c36a15e98c3
Closes-Bug: 1482277

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13056
Submitter: Prakash Bailkeri (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13056
Committed: http://github.org/Juniper/contrail-controller/commit/96e3d718666249b38c7546c9241e75d87369bb9f
Submitter: Zuul
Branch: master

commit 96e3d718666249b38c7546c9241e75d87369bb9f
Author: Prakash M Bailkeri <email address hidden>
Date: Fri Aug 14 11:51:58 2015 +0530

Wrong port id in floatingip-show of floatingip assigned to virtual-ip port

In case of floating ip on the Virtual-ip, svc-monitor will link floating ip to
"right" interface of service VMs launched by ha-proxy service instance.

Ignore such VMI while walking the vmi_ref from floatingip object.
This is done based on the service interface type of the interface.
Right interface of the service instance will have this property value set to "right"

Change-Id: Icb8cc874da4d18c4def631566fc7e2257f2f5fe4
Closes-bug: #1482277

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13338
Submitter: Numan Siddique (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13338
Committed: http://github.org/Juniper/contrail-neutron-plugin/commit/9fca28bb3a11c67c82f538a6706a660384636074
Submitter: Zuul
Branch: master

commit 9fca28bb3a11c67c82f538a6706a660384636074
Author: Numan Siddique <email address hidden>
Date: Wed Aug 26 17:05:51 2015 +0530

Wrong port id in fip-show of floatingip assigned to virtual-ip

In case of floating ip on the Virtual-ip, svc-monitor will link floating ip to
"right" interface of service VMs launched by ha-proxy service instance.

Ignore such VMI while walking the vmi_ref from floatingip object.
This is done based on the service interface type of the interface.
Right interface of the service instance will have this property value set to "right"

(cherry picked from commit 96e3d718666249b38c7546c9241e75d87369bb9f
of contrail-controller)

Change-Id: Ia4ed307c4a5691411bd352a34e26c167c299f118
Closes-bug: #1482277

Changed in opencontrail:
status: New → Fix Released
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22-dev

Review in progress for https://review.opencontrail.org/13927
Submitter: Vinay Vithal Mahuli (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.