service chain RPF route issue
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R3.2 |
New
|
High
|
Sivakumar Ganapathy | |||
R4.0 |
New
|
High
|
Raghunandan Srinivasan | |||
R4.1 |
New
|
High
|
Raghunandan Srinivasan | |||
R5.0 |
Incomplete
|
High
|
ping | |||
Trunk |
New
|
High
|
Raghunandan Srinivasan |
Bug Description
## issue
version 3.2.8.
ATT noticed traffic from some VM does not reach the server after traversing
service chain (innetwork NAT) hosted by certain compute nodes. the RPF check at
the "right" interface towards the source failed (local tap not included) and
hence dropstats shows increasing "invalid source" counter. after
troubleshooting with ATT in bridge it turns out to be routing issue - the
forwarding entry towards the source does not have the tap interface included as
NH, and the reason seems to be the unexpected priority '100' , as opposed to
'200', which is the case for the working compute.
the issue was noticed at around April 6, ATT recovered the issue asap since it
is service impacting. JTAC collected logs and gcore.
## diagram
![image](https:/
![image](https:/
## logs/gcore collected
pings@
total 1284120K
-rw-r--r-- 1 jtac support 1279392280 Apr 27 07:38 alp1r24c005_
-rw-r--r-- 1 jtac support 4305980 Apr 26 14:13 alp1r24c005_
-rw-r--r-- 1 jtac support 3938 Apr 26 09:15 DCAE_issue_
-rw-r--r-- 1 jtac support 6418575 Apr 25 14:16 alp1r24c005.
-rw-r--r-- 1 jtac support 14365 Apr 25 14:03 alp1_DCAE_issue.txt
-rw-r--r-- 1 jtac support 42038 Apr 25 14:01 ALP1B_issue_
-rw-r--r-- 1 jtac support 6359981 Apr 25 14:01 alp1r24c010.
-rw-r--r-- 1 jtac support 13130986 Apr 25 14:01 alp1r24c005_
Password of this below server is "Juniper"
root@
total 1279060
-rw-r--r-- 1 801 20062 1279392280 Apr 27 07:38 alp1r24c005_
-rw-r--r-- 1 801 20062 4305980 Apr 26 14:13 alp1r24c005_
drwxr-xr-x 3 root root 4096 Apr 26 13:48 var
-rw-rw-r-- 1 801 20062 63008 Apr 26 13:25 ist.py
-rw-r--r-- 1 801 20062 3938 Apr 26 09:15 DCAE_issue_
-rw-r--r-- 1 801 20062 6418575 Apr 25 14:16 alp1r24c005.
-rw-r--r-- 1 801 20062 14365 Apr 25 14:03 alp1_DCAE_issue.txt
-rw-r--r-- 1 801 20062 42038 Apr 25 14:01 ALP1B_issue_
-rw-r--r-- 1 801 20062 6359981 Apr 25 14:01 alp1r24c010.
-rw-r--r-- 1 801 20062 13130986 Apr 25 14:01 alp1r24c005_
## suggested recovery steps
* take gcore of vrouter agent (this kernel based) and upload to the case
* restart vrouter agent in the problematic compute #<------issue resolved in this step
* if issue not resolved, bounce the service chain static routes
- remove
- save
- add
- save
## suspicious flow entries
bs1971@
Flow table(size 322174976, entries 2516992)
Entries: Created 345930850 Added 345930719 Deleted 15433217 Changed 15436421 Processed 345930850 Used Overflow entries 0
(Created Flows/CPU: 340725929 111891 696345 52661 175007 181387 54304 170693 183314 181680 180469 185709 840757 53312 52772 61569 55899 52093 53412 52993 53818 53260 55119 52516 464950 55791 56084 56638 62154 58043 54521 56337 823 1312 1085 1072 725131 0 0 0 0 0 0 0 0 0 0 0)(oflows 0)
Action:
Other:
Flags:
TCP(
Listing flows matching ([107.239.
Index Source:
(Gen: 98, K(nh):523, Action:F, Flags:, QOS:-1, S(nh):497, Stats:0/0,
SPort 52103, TTL 0, Sinfo 0.0.0.0)
(Gen: 115, K(nh):519, Action:F, Flags:, QOS:-1, S(nh):519, Stats:0/0,
SPort 55495, TTL 0, Sinfo 0.0.0.0)
(Gen: 182, K(nh):523, Action:F, Flags:, E:1, QOS:-1, S(nh):333, Stats:10/2210, #<------
SPort 54236, TTL 0, Sinfo 11.0.0.0)
1243828<
sudo 32.131.196.51:162
(Gen: 255, K(nh):519, Action:F, Flags:, QOS:-1, S(nh):123, Stats:10/2070,
SPort 58159, TTL 0, Sinfo 172.29.6.224)
NOTE:
* sourceIP: 107.239.223.197
* destination IP: 32.131.196.51
* V17 is right vrf MNS-25180-
* V15 is left vrf; MNS-25180-
* V8 is left service vrf; MNS-25180-
* 333 NH: to sourceIP
* 523 NH: to compute zalp1bfrwl01oam008
## S(nh) 333 -> source 107.239.223.197
in this NH resolution, local tap interface does not show up, which leads to the problem.
bs1971@
Id:333 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:257 Vrf:17
Id:217 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:140 Vrf:0
Id:107 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:149 Vrf:0
Id:157 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:102 Vrf:0
Id:41 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:98 Vrf:0
Id:99 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:149 Vrf:0
## static route configured for service chain
107.
107.
107.
### static route to 107.239.220.0/22: bad compute
the one pointing to local tap interface is with priority 100, which is the
problem.
bs1971@
107.
via [], nh_index:333 , nh_type:ECMP Composite sub nh count: 8, nh_policy:disabled, active_label:-1, vxlan_id:0
via [], nh_index:798 , nh_type:ECMP Composite sub nh count: 8, nh_policy:disabled, active_label:-1, vxlan_id:0
to 2:49:c7:af:f5:b7 via tap49c7aff5-b7, assigned_label:41, nh_index:523 , nh_type:interface, nh_policy:enabled, active_label:41, vxlan_id:0
tap interface is missing in the composite NH pointing to source
bs1971@
Id:333 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:257 Vrf:17
Id:217 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:140 Vrf:0
Id:107 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:149 Vrf:0
Id:157 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:102 Vrf:0
Id:41 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:98 Vrf:0
Id:99 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:149 Vrf:0
### static route to 107.239.220.0/22: working compute
bs1971@
Introspect Host: 172.29.6.174
107.
via ['tape1790263-cf'], nh_index:507 , nh_type:ECMP Composite sub nh count: 8, nh_policy:enabled, active_label:-1, vxlan_id:0
via ['tape1790263-cf'], nh_index:507 , nh_type:ECMP Composite sub nh count: 8, nh_policy:enabled, active_label:-1, vxlan_id:0
to 2:e1:79:2:63:cf via tape1790263-cf, assigned_label:24, nh_index:136 , nh_type:interface, nh_policy:enabled, active_label:24, vxlan_id:0
## tap interface details
the tap interface does has the configured 3 static routes
bs1971@
ItfSandeshData
index: 11
name: tap49c7aff5-b7
uuid: 49c7aff5-
vrf_name: default-
active: Active
ipv4_active: Active
l2_active: L2 Active
ip6_active: Active
health_
dhcp_service: Enable
dns_service: Enable
type: vport
label: 41
l2_label: 42
vxlan_id: 83
vn_name: default-
vm_uuid: 30b0112e-
vm_name: zalp1bfrwl01oam008
ip_addr: 107.239.238.15
mac_addr: 02:49:c7:af:f5:b7
policy: Enable
fip_list
mdata_
service_
os_ifindex: 33
fabric_port: NotFabricPort
alloc_
analyzer_name
config_name: default-
sg_uuid_list
static_
prefix: 21
prefix: 22
prefix: 22
prefix: 64
vm_
admin_state: Enabled
flow_key_idx: 523
allowed_
ip6_addr: 2600:308:160:202::8
local_
tx_vlan_id: -1
rx_vlan_id: -1
parent_
subnet: --NA--
sub_type: Tap
vrf_
vmi_type: Virtual Machine
transport: Ethernet
logical_
flood_
physical_
physical_
fixed_
fixed_
fat_flow_list
metadata_
service_
alias_ip_list
drop_
### route toward the first fixedIP
route toward the first fixed IP does has all 8 NHs, reprenting 8 computes hosting
the 8 FWs, including self:
172.29.6.174
172.29.6.230
172.29.7.169
172.29.7.170
172.29.8.165
172.29.8.229
172.29.6.174
172.29.6.227 (self)
python ist.py vr route -v 17 -p 107.239.238.8/32
107.
via ['tap49c7aff5-b7'], nh_index:304 , nh_type:ECMP Composite sub nh count: 8, nh_policy:enabled, active_label:-1, vxlan_id:0
via ['tap49c7aff5-b7'], nh_index:292 , nh_type:ECMP Composite sub nh count: 8, nh_policy:enabled, active_label:-1, vxlan_id:0
to 2:49:c7:af:f5:b7 via tap49c7aff5-b7, assigned_label:41, nh_index:523 , nh_type:interface, nh_policy:enabled, active_label:41, vxlan_id:0
[INET-EVPN] pref:100
nh_index:0 , nh_type:None, nh_policy:, active_label:-1, vxlan_id:0
bs1971@
Id:304 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:17
Id:41 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:98 Vrf:0
Id:107 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:149 Vrf:0
Id:170 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:128 Vrf:0
Id:217 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:140 Vrf:0
Id:60 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:50 Vrf:0
Id:157 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:102 Vrf:0
Id:99 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:149 Vrf:0
Id:525 Type:Encap Fmly: AF_INET Rid:0 Ref_cnt:9 Vrf:17
information type: | Proprietary → Public |
The problem is probably due to the path in the agent not updated with the correct preference. Someone from the agent side will be taking a look at this issue.
Thanks
Mahesh