[VXLAN EVPN] Tungsten Fabric controller send invalid Ethernet Tag ID in EVPN route update

Bug #1781102 reported by Mark Zhu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
New
High
Nagendra E S

Bug Description

With contrail 5.0, I have 4 nodes: 1 openstack controller, 1 SDN controller, 1 TSN node, 1 compute node.
Cisco n9k has established BGP EVPN session with SDN controller. cisco n9k can get all VMs information.
There is a BMS connected to cisco switch and 2 VMs on compute node. The BMS(1.0.0.100) and VMs(1.0.0.4/1.0.0.5) are in same VNI 20000.

Ping BMS from VM(1.0.0.5): ARP request is encapsulated into MPLS GRE header and forwarded to TSN node. But TSN node doesn't forward it to cisco switch. If i set a static ARP for BMS on VM - 1.0.0.5, ping will be ok between BMS and VM, which means ucast traffic works well.

Ping 1.0.0.7 from BMS:(cisco n9k has arp suppression, so i need to send unknown arp request, then it can be sent to TSN node). But BUM traffic are dropped and not forwarded to VMs.

Some logs on TSN node:

[root@cp1 /]# tcpdump -i eth1 udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
10:21:22.279168 IP 20.20.1.1.37971 > cp1.4789: VXLAN, flags [I] (0x08), vni 20000
ARP, Request who-has 1.0.0.7 tell 1.0.0.100, length 46
10:21:23.280965 IP 20.20.1.1.37971 > cp1.4789: VXLAN, flags [I] (0x08), vni 20000
ARP, Request who-has 1.0.0.7 tell 1.0.0.100, length 46
10:21:24.282930 IP 20.20.1.1.37971 > cp1.4789: VXLAN, flags [I] (0x08), vni 20000
ARP, Request who-has 1.0.0.7 tell 1.0.0.100, length 46
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel

[root@cp1 /]# dropstats | grep -v " 0$"
IF Drop 41

Cloned Original 430

Invalid NH 3
Invalid Mcast Source 974

No L2 Route 87

[root@cp1 /]# dropstats | grep -v " 0$"
IF Drop 41

Cloned Original 430

Invalid NH 3
Invalid Mcast Source 976

No L2 Route 87

[root@cp1 /]# rt --dump 2 --family bridge
Flags: L=Label Valid, Df=DHCP flood, Mm=Mac Moved, L2c=L2 Evpn Control Word, N=New Entry, Ec=EvpnControlProcessing
vRouter bridge table 0/2
Index DestMac Flags Label/VNID Nexthop Stats
31264 0:0:5e:0:1:0 DfEc - 3 0
112924 ff:ff:ff:ff:ff:ff LDfEc 20000 31 1041
115240 2:0:0:0:0:2 DfEc - 12 0
135920 2:a2:c5:f:58:59 LDfEc 20000 28 0
200252 52:54:0:4e:92:48 DfEc - 3 0
205564 2:0:0:0:0:1 DfEc - 12 0
214136 2:b3:a9:87:d8:f6 LDfEc 20000 27 0
245740 8:94:ef:4a:17:47 LEc 20000 21 0
[root@cp1 /]# nh --get 31
Id:31 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:2
              Flags:Valid, Multicast, Etree Root,
              Sub NH(label): 30(0)

Id:30 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:2 Vrf:2
              Flags:Valid, Fabric, Etree Root,
              Sub NH(label): 29(4609)

Id:29 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:3 Vrf:0
              Flags:Valid, MPLSoGRE, Etree Root,
              Oif:0 Len:14 Data:52 54 00 d4 31 65 52 54 00 4e 92 48 08 00
              Sip:10.10.1.3 Dip:10.10.1.4

I saw cisco vtep switch(n9k) is not in flooding list for the entry - ff:ff:ff:ff:ff:ff, and i think this is why BUM can not be forwarded between compute node and cisco switch. Is this a bug or misconfiguration?

Regards
Mark Zhu

Tags: fabric
Mark Zhu (markzhu)
information type: Proprietary → Public
Jeba Paulaiyan (jebap)
tags: added: fabric
no longer affects: juniperopenstack/r5.0
Revision history for this message
Mark Zhu (markzhu) wrote :

We found the root cause that cisco n9k ignore the evpn route with Ethernet Tag ID 20000.

logs of cisco n9k:
2018 Jun 27 06:48:41.912928 bgp: 64512 [14572] (default) UPD: [L2VPN EVPN] Received invalid Ethernet Tag ID 20000 from peer 10.240.235.225, ignoring it

From the RFC7432, we think ethernet tag ID should be 0 in such case. Then we changed code of contrail bgp evpn and rebuild the image, cisco works with Ethernet Tag ID = 0..

So pls confirm if this is a bug of contrail. Or you have a better solution.

Mark Zhu (markzhu)
summary: - [VXLAN EVPN] TSN can not forward BUM traffic between cisco n9k and
+ [VXLAN EVPN] TSN can not forward L2 BUM traffic between cisco n9k and
compute node
Mark Zhu (markzhu)
summary: - [VXLAN EVPN] TSN can not forward L2 BUM traffic between cisco n9k and
- compute node
+ [VXLAN EVPN] Tungsten Fabric controller send invalid Ethernet Tag ID in
+ EVPN route update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.