Traffic from/to BMS doesn't work in redundant MX topology

Bug #1498046 reported by amit surana
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
New
High
Praveen
Trunk
New
High
Praveen

Bug Description

2.21 build 97.

This bug is the exact same test scenario as described in:

1485804 vRouter responds to ARP req for default GW from BMS with vhost0 MAC

only now, the problem has moved downstream. Earlier, it was the TSN responding with its own vhost0 MAC address to ARP requests from BMS (for GW IP); now, the TSN is correctly flooding the ARP request (to TOR, EVPN and Fabric nodes). However, the fabric vRouter is now responding with its own vhost0 mac address as opposed to keeping quiet (and allowing the MX to resolve the ARP).

TSN:

07:47:49.174524 10:0e:7e:be:79:00 > 90:e2:ba:50:ac:89, ethertype IPv4 (0x0800), length 106: 172.16.183.1.4212 > 172.16.180.9.4789: VXLAN, flags [I] (0x08), vni 126
00:e0:ed:20:fa:53 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has 1.1.1.1 tell 1.1.1.3, length 42

07:47:49.174600 90:e2:ba:50:ac:89 > 10:0e:7e:be:79:00, ethertype IPv4 (0x0800), length 106: 172.16.180.9.59266 > 172.16.185.1.4789: VXLAN, flags [I] (0x08), vni 126
00:e0:ed:20:fa:53 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has 1.1.1.1 tell 1.1.1.3, length 42

07:47:49.174621 90:e2:ba:50:ac:89 > 10:0e:7e:be:79:00, ethertype IPv4 (0x0800), length 106: 172.16.180.9.59266 > 172.16.184.200.4789: VXLAN, flags [I] (0x08), vni 8
00:e0:ed:20:fa:53 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has 1.1.1.1 tell 1.1.1.3, length 42
07:47:49.174625 90:e2:ba:50:ac:89 > 10:0e:7e:be:79:00, ethertype IPv4 (0x0800), length 106: 172.16.180.9.59266 > 172.16.184.200.4789: VXLAN, flags [I] (0x08), vni 126
00:e0:ed:20:fa:53 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has 1.1.1.1 tell 1.1.1.3, length 42

07:47:49.174628 90:e2:ba:50:ac:89 > 10:0e:7e:be:79:00, ethertype IPv4 (0x0800), length 106: 172.16.180.9.59266 > 172.16.187.200.4789: VXLAN, flags [I] (0x08), vni 8
00:e0:ed:20:fa:53 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has 1.1.1.1 tell 1.1.1.3, length 42
07:47:49.174632 90:e2:ba:50:ac:89 > 10:0e:7e:be:79:00, ethertype IPv4 (0x0800), length 106: 172.16.180.9.59266 > 172.16.187.200.4789: VXLAN, flags [I] (0x08), vni 126
00:e0:ed:20:fa:53 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has 1.1.1.1 tell 1.1.1.3, length 42

07:47:49.174638 90:e2:ba:50:ac:89 > 90:e2:ba:50:a9:d9, ethertype IPv4 (0x0800), length 138: 172.16.180.9 > 172.16.180.16: GREv0, proto MPLS unicast (0x8847), length 104: MPLS (label 4613, exp 0, [S], ttl 61)
        0x0000: 0000 0000 4500 005c 5c03 0000 4011 5e5a ....E..\\...@.^Z
        0x0010: ac10 b409 ac10 b409 e782 12b5 0048 0000 .............H..
        0x0020: 0800 0000 0000 7e00 ffff ffff ffff 00e0 ......~.........
        0x0030: ed20 fa53 0806 0001 0800 0604 0001 00e0 ...S............
        0x0040: ed20 fa53 0101 0103 0000 0000 0000 0101 ...S............
        0x0050: 0101 0000 0000 0000 0000 0000 0000 0000 ................

compute node:

07:47:49.174711 90:e2:ba:50:ac:89 > 90:e2:ba:50:a9:d9, ethertype IPv4 (0x0800), length 138: 172.16.180.9 > 172.16.180.16: GREv0, proto MPLS unicast (0x8847), length 104: MPLS (label 4613, exp 0, [S], ttl 61)
        0x0000: 0000 0000 4500 005c 5c03 0000 4011 5e5a ....E..\\...@.^Z
        0x0010: ac10 b409 ac10 b409 e782 12b5 0048 0000 .............H..
        0x0020: 0800 0000 0000 7e00 ffff ffff ffff 00e0 ......~.........
        0x0030: ed20 fa53 0806 0001 0800 0604 0001 00e0 ...S............
        0x0040: ed20 fa53 0101 0103 0000 0000 0000 0101 ...S............
        0x0050: 0101 0000 0000 0000 0000 0000 0000 0000 ................

07:47:49.174794 90:e2:ba:50:a9:d9 > 10:0e:7e:be:79:00, ethertype IPv4 (0x0800), length 106: 172.16.180.16.49548 > 172.16.183.1.4789: VXLAN, flags [I] (0x08), vni 126
90:e2:ba:50:a9:d9 > 00:e0:ed:20:fa:53, ethertype ARP (0x0806), length 56: Reply 1.1.1.1 is-at 90:e2:ba:50:a9:d9, length 42

An additional problem I see is that the TSN sends out 2 ARP requests to each of the MX routers: one one VxLAN 126 and the other on VxLAN 8. This seems problematic as well. nh 112 below is the problematic one.

root@csol2-node9:~# nh --get 22
Id:22 Type:Vrf_Translate Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:2 Vrf:4
              Vrf:4

root@csol2-node9:~# rt --dump 4 --family bridge
Kernel L2 Bridge table 0/4

Flags: L=Label Valid, Df=DHCP flood

Index DestMac Flags Label/VNID Nexthop
75840 0:0:5e:0:1:1 LDf 8 18
136572 0:e0:ed:20:78:47 - 1
153104 3c:94:d5:e5:37:f0 LDf 126 81
193204 0:e0:ed:20:fa:53 L 126 43
196364 0:0:5e:0:1:0 Df - 3
206596 ff:ff:ff:ff:ff:ff LDf 126 53
228528 90:e2:ba:50:ac:89 Df - 3
229516 2:d4:c4:d2:93:45 LDf 126 41
238524 28:c0:da:fd:2f:f0 LDf 8 18
root@csol2-node9:~# nh --get 53
Id:53 Type:Composite Fmly:AF_BRIDGE Flags:Valid, Multicast, L2, Rid:0 Ref_cnt:4 Vrf:4
              Sub NH(label): 54(0) 112(0) 69(0)

root@csol2-node9:~# nh --get 112
Id:112 Type:Composite Fmly: AF_INET Flags:Valid, Evpn, Rid:0 Ref_cnt:2 Vrf:4
              Sub NH(label): 18(8) 18(126) 81(8) 81(126)

root@csol2-node9:~# nh --get 18
Id:18 Type:Tunnel Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:29 Vrf:0
              Oif:0 Len:14 Flags Valid, Vxlan, Data:10 0e 7e be 79 00 90 e2 ba 50 ac 89 08 00
              Vrf:0 Sip:172.16.180.9 Dip:172.16.184.200

root@csol2-node9:~# nh --get 81
Id:81 Type:Tunnel Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:20 Vrf:0
              Oif:0 Len:14 Flags Valid, Vxlan, Data:10 0e 7e be 79 00 90 e2 ba 50 ac 89 08 00
              Vrf:0 Sip:172.16.180.9 Dip:172.16.187.200

root@csol2-node9:~# nh --get 69
Id:69 Type:Composite Fmly: AF_INET Flags:Valid, Fabric, Rid:0 Ref_cnt:2 Vrf:4
              Sub NH(label): 46(4613)

root@csol2-node9:~# nh --get 46
Id:46 Type:Tunnel Fmly: AF_INET Flags:Valid, MPLSoGRE, Rid:0 Ref_cnt:7 Vrf:0
              Oif:0 Len:14 Flags Valid, MPLSoGRE, Data:90 e2 ba 50 a9 d9 90 e2 ba 50 ac 89 08 00
              Vrf:0 Sip:172.16.180.9 Dip:172.16.180.16

On compute:

root@csol2-node16:~# nh --get 120
Id:120 Type:Composite Fmly:AF_BRIDGE Flags:Valid, Multicast, L2, Rid:0 Ref_cnt:4 Vrf:5
              Sub NH(label): 42(0) 119(0) 47(0) 53(0)

root@csol2-node16:~# nh --get 119
Id:119 Type:Composite Fmly: AF_INET Flags:Valid, Evpn, Rid:0 Ref_cnt:2 Vrf:5
              Sub NH(label): 39(8) 39(126) 117(8) 117(126)

root@csol2-node16:~# nh --get 39
Id:39 Type:Tunnel Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:11 Vrf:0
              Oif:0 Len:14 Flags Valid, Vxlan, Data:10 0e 7e be 79 00 90 e2 ba 50 a9 d9 08 00
              Vrf:0 Sip:172.16.180.16 Dip:172.16.184.200

root@csol2-node16:~# nh --get 117
Id:117 Type:Tunnel Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:7 Vrf:0
              Oif:0 Len:14 Flags Valid, Vxlan, Data:10 0e 7e be 79 00 90 e2 ba 50 a9 d9 08 00
              Vrf:0 Sip:172.16.180.16 Dip:172.16.187.200

Tags: vrouter
amit surana (asurana-t)
description: updated
Revision history for this message
Hari Prasad Killi (haripk) wrote :
Revision history for this message
amit surana (asurana-t) wrote :

Not entirely sure that the two issues are the same. Probably needs more investigation. I will provide access to my setup to debug further.

In this bug, its the inter-VxLAN pings between BMSs that fails because the ARP for the GW IP is not resolved correctly. Intra-VN pings between BMSs and BMS/VM work fine. In 1491644 (which does not have TSN in the topology), intra-VN pings between BMS are failing.

Furthermore, if there are no VMs in the topology, then intra-VN BMS pings are successful as TSN sends ARP flood only to TORs and MX(s).

Nischal Sheth (nsheth)
tags: added: vrouter
Revision history for this message
Divakar Dharanalakota (ddivakar) wrote :

When BMSs are behind Mx and when there is MX redundancy in the network, if a BMS pings another BMS, the ARP cache of first BMS for second BMS is poisoned with Vrouter compute node's MAC, leading to connectivity failure between two BMS's.

Root cause:
When ARP request of BMS1 is flooded to a compute node by MX, Vrouter does the source IP lookup for BMS IP in Inet table. This lookup results in subnet route pointing to ECMP nexthop of two Mxs. This makes Vrouter respond with Vhost's MAC to force the packets to L3 processing though the ARP request is not meant for any VM's in that compute node.

amit surana (asurana-t)
information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.