[R2.20] Incorrect NH on TSN causes VM<>BMS packets to be dropped
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R2.20 |
Fix Committed
|
Medium
|
Manish Singh | |||
Trunk |
Fix Committed
|
Medium
|
Manish Singh |
Bug Description
R2.20 Ubuntu 14.0.4
Ended up in a situation where the overlay VM is unable to reach the BMS server. I haven't attempted a recreate to nail the exact steps because I wanted to leave the setup as is, if someone wants to debug it.
Started with 2 TORs (ovs/vtep), each having 1 BMS. Tor Agent connections to the 2 TORs are up and the tunnels are setup correctly. The BMS MACs are known to the Tor Agent. Then added a guest VM in VN1. The BMSs are part of the same VN.
From vrouter hosting the VM(14.1.1.3):
Flags: L=Label Valid, P=Proxy ARP, T=Trap ARP, F=Flood ARP
Destination PPL Flags Label Nexthop Stitched MAC(Index)
14.1.1.1/32 32 PT - 6 -
14.1.1.2/32 32 PT - 6 -
14.1.1.3/32 32 P - 15 2:42:4d:
169.254.169.254/32 32 PT - 5 -
Kernel L2 Bridge table 0/1
Flags: L=Label Valid, Df=DHCP flood
Index DestMac Flags Label/VNID Nexthop
97192 ff:ff:ff:ff:ff:ff L 4 23
150292 0:10:18:96:f0:b6 - 2
156248 0:1:0:0:0:3b L 4 13
172048 2:42:4d:b0:a8:6e - 18
242336 0:1:0:0:0:3c L 4 12
252916 0:0:5e:0:1:0 - 2
0:1:0:0:0:3b and 0:1:0:0:0:3c are the two BMSs.
VxLAN tunnel to BMS is setup correctly:
root@c4-fpc10:~# nh --get 13
Id:013 Type:Tunnel Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:2 Vrf:0
Oif:0 Len:14 Flags Valid, Vxlan, Data:f8 c0 01 23 13 c5 00 10 18 96 f0 b6 08 00
Vrf:0 Sip:11.1.7.2 Dip:11.1.9.2
root@c4-fpc10:~# nh --get 12
Id:012 Type:Tunnel Fmly: AF_INET Flags:Valid, Vxlan, Rid:0 Ref_cnt:2 Vrf:0
Oif:0 Len:14 Flags Valid, Vxlan, Data:f8 c0 01 23 13 c5 00 10 18 96 f0 b6 08 00
Vrf:0 Sip:11.1.7.2 Dip:11.1.6.2
The L2 broadcast route points to the TSN:
root@c4-fpc10:~# nh --get 23
Id:023 Type:Composite Fmly:AF_BRIDGE Flags:Valid, Multicast, L2, Rid:0 Ref_cnt:4 Vrf:1
Sub NH(label): 22(0) 19(0)
root@c4-fpc10:~# nh --get 22
Id:022 Type:Composite Fmly: AF_INET Flags:Valid, Fabric, Rid:0 Ref_cnt:2 Vrf:1
Sub NH(label): 21(1030)
root@c4-fpc10:~# nh --get 21
Id:021 Type:Tunnel Fmly: AF_INET Flags:Valid, MPLSoGRE, Rid:0 Ref_cnt:2 Vrf:0
Oif:0 Len:14 Flags Valid, MPLSoGRE, Data:f8 c0 01 23 13 c5 00 10 18 96 f0 b6 08 00
Vrf:0 Sip:11.1.7.2 Dip:11.1.1.2
However, on the TSN, there is no corresponding entry for MPLS label 1030.
root@c1-qa15:~# mpls --dump
MPLS Input Label Map
Label NextHop
-------------------
16 9
As such, all the broadcast packets from the compute server hosting the guest VM are dropped on the TSN with reason 'invalid NH'.
Topology:
#Role definition of the hosts.
env.roledefs = {
'all': [host1, host2, host3, host4, host5],
'cfgm': [host3],
'openstack': [host3],
'control': [host3],
'compute': [host1, host2, host4, host5],
'collector': [host3],
'webui': [host3],
'database': [host3],
'build': [host_build],
'toragent': [host2],
'tsn': [host2],
'storage-
'storage-
}
controller: 10.87.129.245 root:n1keenA
Changed in juniperopenstack: | |
assignee: | nobody → Manish Singh (manishs) |
tags: | added: vrouter |
tags: | added: blocker |
information type: | Proprietary → Public |
the core file can be found here:
10.84.5. 100:/cs- shared/ bugs/1450683/ core.9946
In addition a Tor Agent crash was also seen on the setup. If it turns out that this bug and the crash have different root cause, we can open another bug to track the latter.
Program terminated with signal SIGSEGV, Segmentation fault. :nexthop( ) const () :nexthop( ) const () l::BuildTorMult icastMessage( autogen: :EnetItemType& , std::basic_ stringstream< char, std::char_ traits< char>, std::allocator< char> >&, AgentRoute*, boost:: asio::ip: :address_ v4 const*, std::string const&, std::vector<int, std::allocator<int> > const*, unsigned int, unsigned int, bool) () l::ControllerSe ndEvpnRouteComm on(AgentRoute* , boost:: asio::ip: :address_ v4 const*, std::string, std::vector<int, std::allocator<int> > const*, unsigned int, unsigned int, bool) () l::ControllerSe ndEvpnRouteDele te(AgentXmppCha nnel*, AgentRoute*, std::string, unsigned int, unsigned int) () :MulticastNotif y(AgentXmppChan nel*, bool, DBTablePartBase*, DBEntryBase*) () :RunNotify( DBTablePartBase *, DBEntryBase*) () ::RunNotify( ) () :QueueRunner: :Run() () libtbb. so.2 libtbb. so.2 libtbb. so.2 libtbb. so.2 libtbb. so.2 64-linux- gnu/libpthread. so.0 64-linux- gnu/libc. so.6
#0 0x0000000000824380 in AgentPath:
(gdb) bt
#0 0x0000000000824380 in AgentPath:
#1 0x00000000009c9c24 in AgentXmppChanne
#2 0x00000000009cb7ec in AgentXmppChanne
#3 0x00000000009cbe2c in AgentXmppChanne
#4 0x00000000009b55ab in RouteExport:
#5 0x0000000000ca0e82 in DBTableBase:
#6 0x0000000000ca3518 in DBTablePartBase
#7 0x0000000000c9f9ed in DBPartition:
#8 0x0000000000d9da70 in TaskImpl::execute() ()
#9 0x00007fe62c67ab3a in ?? () from /usr/lib/
#10 0x00007fe62c676816 in ?? () from /usr/lib/
#11 0x00007fe62c675f4b in ?? () from /usr/lib/
#12 0x00007fe62c6720ff in ?? () from /usr/lib/
#13 0x00007fe62c6722f9 in ?? () from /usr/lib/
#14 0x00007fe62c896182 in start_thread () from /lib/x86_
#15 0x00007fe62bb6efbd in clone () from /lib/x86_