Comment 22 for bug 1518501

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16074
Committed: http://github.org/Juniper/contrail-controller/commit/a56f464f0724811954e241fc33f07a0526a86311
Submitter: Zuul
Branch: master

commit a56f464f0724811954e241fc33f07a0526a86311
Author: Manish <email address hidden>
Date: Tue Dec 8 22:59:03 2015 +0530

Mpls label from index vector not reused.

Problem:
In multicast evpn label, allocation happens from regular unicast range. When
this label is deleted it is not reset in index vector(Index vector manages
labels).
This used to result in freed label not being re-used by new allocations.

Solution:
Multicast has two sets of labels. First is fabric replication labels which are
reserved at init itself and remains till end of life.
Second set ie evpn label as mentioned above is allocated from non reserved
range. Both these labels are of type MCAST_NH. Now when evpn label is freed the
destructor used to check for MCAST_NH and skip de-allocation from index vector.
Fix is to check if mcast label is from fabric label reserved range and then
refrain from de-allocation. In all other case do the de-allocation.

Closes-bug: 1518501

Conflicts:
 src/vnsw/agent/cmn/agent.cc

Conflicts:
 src/vnsw/agent/cmn/agent.cc
 src/vnsw/agent/controller/controller_init.h

Change-Id: Idd584a84d30f768818de30bfad30a1d1d811447c
(cherry picked from commit 6389d9cd0106b8fa25c81c6fe373d5e27bfe8d6a)

Agent crash @Filltrace, AgentRouteTable::DeletePathFromPeer

Problem:
DeleteAllBgpPath deletes all path from CN. While deleting there may be some
paths which internally delete dependant paths and hence invalidating path
iterator.
Similar issue was observed here in bridge_route.cc:
https://bugs.launchpad.net/juniperopenstack/+bug/1508894

Solution:
Maintain a list of paths to be deleted and once path list is traversed,
go through this list and delete paths.

Change-Id: I72eedd275ddf8d0c81c5aff10f384b5c45ce3696
Closes-bug: 1524140
(cherry picked from commit 7e51c5e5aa7534840350674fb0ff2b59ecb74bb6)

Agent crash@ invalid path

Problem:
Deleteallbgppeerpath was executed and it will create a list of all paths to be
deleted. This used to include stale and non-stale paths from CN.
Later it will run over the list and delete them. During deletion of path by
fabric_multicast_peer, DeletePath routine also flushes out stale paths.
This is because it assumes that any non bgp peer path delete means route is no
more valid and no need of stale paths.
Now after deleting fabric_multicast_peer path when control returns to delete
remainders, it used to find stale path pointer(which as explained above gets
deleted by
fabric_multicast_peer).

Solution:
Dont push stale paths in to_be_deleted_paths list.
Once all paths are deleted from to_be_deleted_paths, explicitly call for squash
of stale paths, if any left.
Basically isolate deletion of stale and non stael paths.

Change-Id: I2ae025077502713767ac22f64e338e7e7cb50ac6
Closes-bug: 1524140
(cherry picked from commit e8dfd5b94f3ef1c650964d04285e2e4bd02b15b7)

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Agent crashed because of freed xmpp channel.

Problem:
When agent xmpp channel is deleted, event to bring down bgp peer associated with
same is issued. This in turn starts a walk to clear all states and path
associated with this bgp peer. On finishing walk it unregisters all table
listeners. Meanwhile there can be a notification to controller on deleted peer.
This notification uses channel to verify if peer is deleted and ignore
notification. In this case channel was also deleted and hence crash happens.

Solution:
Defer deletion of channel till bgp peer is deleted.
This will not be executed if channel is not deleted but keeps flapping.

Closes-bug: 1524136

Change-Id: Ia43cbda1d9d2e8f30f30f36165b967230ec39025
(cherry picked from commit 887f5a2572767d881d4c2cb3491e812b30c31724)

Agent crash@ MplsTable::Add

Problem:
Crash was seen at creation of mpls label, where label itself was de-allocated
and re-allocation of same was not done.
This happened for multicast route where same label(evpn) is used across multiple
paths in route - local path, BGP peer(every). If BGP peer path goes away
recompute of master path for multicast used to by default delete label. In same
routine it used to find any other path having evpn label. If any is seen then
creation for same label was enqueued.
After this operation in DB for same label there is a delete and add operation.
Because of yielding policy delete may get executed and de-alloc label. Now when
add is executed agent crashes as label is not allocated.
If both these request get executed at once then this crash will not be seen.

Solution:
Delete evpn label only when non bgp peer path i.e. local vm peer goes off.
Any path with BGP peer will be using agent allocated label.
Along with this make all calls to add/delete mpls via recompute path of
multicast as inline.

Change-Id: I96030915be00ac3587c7e830b3328f7a2aec0dfb
Closes-bug: 1525737
(cherry picked from commit 3dcc27a25357cca0038e162320fbfa326924a432)

Stale label in TSN.

Problem:
In RecomputeMulticastPath evpn label was populated when
local_vm_peer path was not present. This was happening as it was populated from
other paths (tor/evpn) even in absence of local_vm_peer_path. This results in
stale label.
Example: No local vm peer path is present, tor path is present for flood route.
Now fabric path was added. RecomputeMulticastPath will populate evpn_label due
to presence of tor path. Addition of label has a check to add only if evpn label
is not zero and path peer is not BGP. In this case label was populated from tor
and peer is fabric_peer. Hence a stale label is added.

Solution:
Add label only when local vm peer path is added.

Change-Id: Ibc7c89b890ee7ee3a9ccc163b9cce086d278e98f
Closes-bug: 1525737
(cherry picked from commit 57728bb04fd1cd8ae3e94dcc47ca2129cdc0b466)

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Agent xmpp channel deleted.

Problem:
Statement is same as in- https://review.opencontrail.org/#/c/15743/
This didnt fix the issue completely. There was one case observed where two
requests were enqueued for bgp peer down on same channel. One got processed and
deleted channel before second one was dequeued. In this case channel should have
been retained for second response.

Solution:
Instead of keeping the mentioned fix, use shared_ptr for lifetime of channel.
All BGP peer take reference and agent xmpp channel in agent(class) take
reference. Once all are freed channel gets deleted.

Closes-bug: 1524136

Conflicts:
 src/vnsw/agent/cmn/agent.cc

Change-Id: Id0cdbf737033a491fe1637586f5a85600377079a
(cherry picked from commit a518cb88f8b15c96e8ccc9fdf9a34813daf870d5)