On supervisor-vrouter restart, tunnel dest ip of BMS MAC is reset to 0.0.0.0

Bug #1457355 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Manish Singh
Trunk
Fix Committed
High
Manish Singh

Bug Description

R2.20 Ubuntu 14.04 Multinode juno setup

In the below setup, supervisor-vrouter was restarted on all 3 compute nodes.
It was then noticed that the tunnel nexthop for BMS MAC 00:00:00:00:00:01 was set to 0.0.0.0 instead of 99.99.99.99
VXlan id too is set to 0 instead of 4

Manish is aware of the issue
http://nodek3:8085/Snh_EvpnRouteReq?x=1 shows 00:00:00:00:00:01 with tunnel dest ip 99.99.99.99 with vxlan 0

http://nodek3:8085/Snh_Layer2RouteReq?x=1 shows 00:00:00:00:00:01 with tunnel dest ip 0.0.0.0 with vxlan id 0

Logs and gcore of vrouter-agent and tor-agents will be in http://10.204.216.50/Docs/bugs/#

env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6],
    'cfgm': [host1,host2,host3],
    'openstack': [host2],
    'control': [host1,host2,host3],
    'compute': [host4,host5, host6],
    'collector': [host1,host2,host3],
    'webui': [host1],
    'database': [host1,host2,host3],
    'toragent': [host6],
    'tsn': [host6],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodec1', 'nodec2', 'nodec3', 'nodek1', 'nodek2', 'nodek3']
}

Tags: bms vrouter
Revision history for this message
Manish Singh (manishs) wrote :

Also label was interpreted wrongly.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10649
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10649
Committed: http://github.org/Juniper/contrail-controller/commit/f8f09e3f94590333a3f1a38acff47aeb834c1f18
Submitter: Zuul
Branch: R2.20

commit f8f09e3f94590333a3f1a38acff47aeb834c1f18
Author: Manish <email address hidden>
Date: Thu May 21 14:51:40 2015 +0530

Mac route in agent was having invalid label and destination NH.

Problem:
Two issues were seen -
1) Label was wrongly interpreted as MPLS though it was VXLAN and never got
corrected. Sequence of events was that EVPN route was received before global
vrouter config. Encapsulation priorities was absent in agent, so evpn route
was programmed with Mpls-gre(control-node had sent Vxlan). Since Mpls encap
was chosen, label was interpreted as mpls label and not vxlan. When global
vrouter config was received resync was done for route. Resync also failed to
rectify encap to Vxlan(since vxlan is now available in priorities) because this
decision to rectify is based on vxlan id(i.e. if vxlan id is 0 default to mpls
as its invalid). In this case vxlan id
was 0 as explained above and hence encap continued to be Mpls.

2) Nexthop was different between evpn route and derived mac route.
This happened because in path sync of evpn route return was false even though NH
change was seen which resulted in avoidance of mac route rebake. Return was
false because value set by ChangeNh as true was overridden by MplsChange.

Solution:
For case 1) - If encap is Vxlan only in the message sent by control-node then
put label as vxlan id and mpls label as invalid, even though tunnel type is
computed as Mpls(encapsulation prioirties is not received). In case of Mpls
encap sent use label as Mpls and reset vxlan to invalid. In case both Vxlan and
Mpls are sent in encap then fallback to old approach of interpreting label on
computed tunnel type.

For case 2) - Fix the return value.

Change-Id: Ibeeb3de16d618ecb931c35d8937591d9c9f7f15e
Closes-bug: 1457355

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11493
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/11496
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged
Download full text (7.4 KiB)

Reviewed: https://review.opencontrail.org/11496
Committed: http://github.org/Juniper/contrail-controller/commit/bf9e76793aab5870da98bf49b5fb6c6eed82625f
Submitter: Zuul
Branch: master

commit bf9e76793aab5870da98bf49b5fb6c6eed82625f
Author: Manish <email address hidden>
Date: Thu May 14 17:23:05 2015 +0530

Handle error from XMPP session send.

Problem:
If XMPP channel send fails which internally(TCP send) translates into a defer
send operation, then agent controller module was treating this as a failure.
In case of route update which is seen in bug any such defer will result in
exported flag for route set to false. Since it is set to false, later on route
delete the unsubscribe for this route will not be sent. However update of route
was deferred and would have gone in some time but control node will never get
delete which will result in issue mentioned by the bug.

Solution:
Ideally all the failed send should be used to enqueue further requests to
control node and replay them whenever callback(writereadycb) is called.
In this way agent will not overload socket.
However as a quick fix the error will not be used to judge the further operation
after send is done. This is in asumption that send will always be succesful.
Currently following messages are sent:
1) VM config sub/unsub
2) Config subscribe for agent
3) VRF sub/unsub
4) Route sub/unsub.
Connection not present will be taken care by channel flap handling.

Change-Id: Ib6e0856b5c689b51209add4ab459b8bd2e952143
Closes-bug: 1453483
(cherry picked from commit 0a70915fd3bc1954154e657f59123d1a4597f2a4)

VRF state not deleted in DelPeer walk.

Problem statement remains same as in this commit:
https://github.com/Juniper/contrail-controller/commit/8e302fcb991c8f5d8f5defb85b9851f8cde5f479

However above commit does not solve the issue.
Reason being, walk count was being incremented on enqueue of walk but when walk
is processed it calls Cancel for any previously started walk. This Cancel
decrements the walk count. This defeats the purpose of moving walk count
increment to enqueue in above commit.
Also consider a walk for VRF where there are four route tables. This should
result in walk count to be 5 (1 for vrf table + 4 for route tables). With above
fix this will be 2 (1 for Vrf + 1 for route table). It didnt take into
consideration that route walk count needs to be incremented for each route
table.

Solution:
Use a seperate enqueue walk count and restore the walk_count as it was before
the above commit. Use both of them to check for walk done.

Closes-bug: 1455862
Change-Id: I8d96732375f649e70d6754cb6c5b34c24867ce0c
(cherry picked from commit 705165854bbdaff9c47a2a9443410eec53e4fb37)

Multicast route gets deleted when vxlan id is changed in configured mode

Problem:
In oper multicast if local peer vxlan-id is changed then there was add issued
for route with new vxlan and delete issued for same with old vxlan.
Since the peer is local the path search only compares peer and not vxlan.
This results in deletion of local path and eventually the multicast route.

Solution:
Need not withdraw path from local peer on vxlan id change. Just trigger update
of same. This will result in controller route _ex...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.