Comment 0 for bug 1653479

Revision history for this message
kalagesan (kalagesan) wrote :

* version
contrail v3.1.1.0-45
QFX4 v14.1X53-D33
QFX11 v14.1X53-D33.2

* tor-agent and QFX
QFX4(172.23.11.45): tor-agent-4
QFX11(172.23.11.37): tor-agent-11

* topology diagram
[IXIA]===[QFX4]====[VXLAN]====[QFX11]===[IXIA]

customer configured 500 virtual-network on Contrail.
customer applied the traffic to each virtual-network and tested the failure of TSN node.

when our customer stop of tor-agent process by one TSN node(172.23.10.196),
communication disconnection occurred on 76/500VN.

The customer picked up from the communication disconnected VN and confirmed the BUM Tree.
The BUM Tree of the picked up VN was broken.

* BUM Tree status
[The BUM Tree status before stop tor-agent]
via TSN(172.23.10.196)
QFX4-----(multicast:33:33:33:33:33:33)----->QFX11
QFX4<----(broadcast:ff:ff:ff:ff:ff:ff)------QFX11
*500VN

[stop tor-agent in TSN(172.23.10.196)]
-----
root@openc-14:~# date
Thu Dec 15 16:42:03 JST 2016
root@openc-14:~# service contrail-tor-agent-4 stop
date
contrail-tor-agent-4: stopped
root@openc-14:~# date
Thu Dec 15 16:42:04 JST 2016
-----

[Expected behavior]
Change BUM Tree
via TSN(172.23.10.197)-TSN(172.23.10.196)
QFX4-----(multicast:33:33:33:33:33:33)----->QFX11
QFX4<----(broadcast:ff:ff:ff:ff:ff:ff)------QFX11
*500VN

[Pickup VN1]
It is broken BUMTree.

Check TSN(172.23.10.197)
root@openc-15:~# vxlan --get 524
VXLAN Table

VNID NextHop
----------------
524 640
root@openc-15:~# nh --get 640
Id:640 Type:Vrf_Translate Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1978
Flags:Valid, Vxlan, Unicast Flood,
Vrf:1978

root@openc-15:~# rt --family bridge --dump 1978
Flags: L=Label Valid, Df=DHCP flood
vRouter bridge table 0/1978
Index DestMac Flags Label/VNID Nexthop
51356 90:1b:e:52:bb:ea Df - 3
296020 30:20:0:8:0:3 - 1
367936 ff:ff:ff:ff:ff:ff LDf 524 13327
909752 30:20:0:8:0:4 - 1
974164 30:48:4:0:0:8 LDf 524 16
root@openc-15:~# nh --get 13327
Id:13327 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:3 Vrf:1978
Flags:Valid, Multicast, L2,
Sub NH(label): 13318(0) 13040(0)

Id:13318 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1978
Flags:Valid, Tor,
Sub NH(label): 16(524)

Id:16 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:41360 Vrf:0
Flags:Valid, Vxlan,
Oif:0 Len:14 Flags Valid, Vxlan, Data:5c 5e ab 03 57 f0 90 1b 0e 52 bb ea 08 00
Vrf:0 Sip:172.23.10.197 Dip:172.23.11.45 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<QFX4

Id:13040 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1978
Flags:Valid, Evpn,
Sub NH(label):

There is a path of QFX4.
However,there is no path to other TSN node(172.23.10.196).

[Pickup VN2]
It is normal status.

Check TSN(172.23.10.197)
root@openc-15:~# vxlan --get 518
VXLAN Table

VNID NextHop
----------------
518 1302
root@openc-15:~# nh --get 1302
Id:1302 Type:Vrf_Translate Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Vxlan, Unicast Flood,
Vrf:2448

root@openc-15:~# rt --family bridge --dump 2448
Flags: L=Label Valid, Df=DHCP flood
vRouter bridge table 0/2448
Index DestMac Flags Label/VNID Nexthop
59857 30:20:0:2:0:4 - 1
249460 30:48:4:0:0:2 LDf 518 16
344168 30:48:8:0:0:2 LDf 518 17
446440 30:20:0:2:0:3 - 1
522112 ff:ff:ff:ff:ff:ff LDf 518 12521
722868 90:1b:e:52:bb:ea Df - 3
root@openc-15:~# nh --get 12521
Id:12521 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:2448
Flags:Valid, Multicast, L2,
Sub NH(label): 10434(0) 12254(0) 4507(0)

Id:10434 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Tor,
Sub NH(label): 16(518)

Id:16 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:41360 Vrf:0
Flags:Valid, Vxlan,
Oif:0 Len:14 Flags Valid, Vxlan, Data:5c 5e ab 03 57 f0 90 1b 0e 52 bb ea 08 00
Vrf:0 Sip:172.23.10.197 Dip:172.23.11.45 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<QFX4

Id:12254 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Evpn,
Sub NH(label):

Id:4507 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Fabric,
Sub NH(label): 1584(191914)

Id:1584 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:2047 Vrf:0
Flags:Valid, MPLSoGRE,
Oif:0 Len:14 Flags Valid, MPLSoGRE, Data:90 1b 0e 44 2c b5 90 1b 0e 52 bb ea 08 00
Vrf:0 Sip:172.23.10.197 Dip:172.23.10.196 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Connection between TSN and TSN

Check TSN(172.23.10.196)
root@openc-14:~# vxlan --get 518
VXLAN Table

VNID NextHop
----------------
518 12002
root@openc-14:~# nh --get 12002
Id:12002 Type:Vrf_Translate Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Vxlan, Unicast Flood,
Vrf:483

root@openc-14:~# rt --family bridge --dump 483
Flags: L=Label Valid, Df=DHCP flood
vRouter bridge table 0/483
Index DestMac Flags Label/VNID Nexthop
25892 90:1b:e:44:2c:b5 Df - 3
66856 30:48:4:0:0:2 LDf 518 15
153520 ff:ff:ff:ff:ff:ff LDf 518 17808
289320 30:20:0:2:0:4 - 1
315000 30:20:0:2:0:3 - 1
752656 0:0:5e:0:1:0 Df - 3
785920 30:48:8:0:0:2 LDf 518 16
root@openc-14:~# nh --get 17808
Id:17808 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:483
Flags:Valid, Multicast, L2,
Sub NH(label): 17803(0) 13643(0) 8469(0)

Id:17803 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Tor,
Sub NH(label): 16(518)

Id:16 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:1506 Vrf:0
Flags:Valid, Vxlan,
Oif:0 Len:14 Flags Valid, Vxlan, Data:5c 5e ab 03 57 f0 90 1b 0e 44 2c b5 08 00
Vrf:0 Sip:172.23.10.196 Dip:172.23.11.37 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<QFX11

Id:13643 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Evpn,
Sub NH(label):

Id:8469 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Fabric,
Sub NH(label): 1555(190816)

Id:1555 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:2047 Vrf:0
Flags:Valid, MPLSoGRE,
Oif:0 Len:14 Flags Valid, MPLSoGRE, Data:90 1b 0e 52 bb ea 90 1b 0e 44 2c b5 08 00
Vrf:0 Sip:172.23.10.196 Dip:172.23.10.197 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Connection between TSN and TSN

customer like to understand root cause about this issue from the logs provided

Below files are collected from customer and its available in my local log server:

root@10.219.48.123, pwd:Jtaclab123

fileupload directory path:/home/kannan/bumtree

Contrail:
testbed.py
under /var/log/contrail logs.
gcore file of tor-agent process.

QFX:
RSI
under /var/log archive
VN information on communication disconnection