Service Health Check doesnt recover after failure
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R4.1 |
Fix Committed
|
High
|
Hari Prasad Killi | |||
Trunk |
Fix Committed
|
High
|
Hari Prasad Killi |
Bug Description
version: 4.1.0.0-1
If healthcheck fails and restores the Interface never becomes active.
After the failure, the probes never reach the VM-tap interface and observing increase in discard counter, believe this be the reason we dont recover after restore.
WA: is to disassoc and associate the SHC to the VMI
Also observed HC probes doesnt wait for delay seconds between retries upon failure. (Not sure if its by design)
Tcpdump for probe retry interval:
12:27:40.377930 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 12.1.1.2 (00:00:5e:00:01:00) tell 12.1.1.2, length 28
12:27:40.427450 IP (tos 0x0, ttl 63, id 37879, offset 0, flags [DF], proto TCP (6), length 60)
12.1.1.2.43546 > 12.1.1.3.80: Flags [S], cksum 0x1a35 (incorrect -> 0x7d1c), seq 1556438834, win 29200, options [mss 1460,sackOK,TS val 195415237 ecr 0,nop,wscale 7], length 0
12:27:40.427697 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
12.1.1.3.80 > 12.1.1.2.43546: Flags [R.], cksum 0x2f67 (correct), seq 0, ack 1556438835, win 0, length 0
12:27:40.440191 IP (tos 0x0, ttl 63, id 28092, offset 0, flags [DF], proto TCP (6), length 60)
12.1.1.2.43548 > 12.1.1.3.80: Flags [S], cksum 0x1a35 (incorrect -> 0xf805), seq 2677318004, win 29200, options [mss 1460,sackOK,TS val 195415240 ecr 0,nop,wscale 7], length 0
12:27:40.440352 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
12.1.1.3.80 > 12.1.1.2.43548: Flags [R.], cksum 0xaa53 (correct), seq 0, ack 2677318005, win 0, length 0
12:27:40.443326 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 12.1.1.1 (00:00:5e:00:01:00) tell 12.1.1.1, length 28
Flow details:
root@5b4s10:~# flow -l | grep -2 508368
SPort 55171, TTL 0, Sinfo 0.0.0.0)
187416<=>508368 12.1.1.3:80 6 (2->0)
(Gen: 1, K(nh):23, Action:N(SD), Flags:, TCP:Sr, QOS:-1, S(nh):23, Stats:0/0,
--
SPort 61408, TTL 0, Sinfo 0.0.0.0)
508368<=>187416 5.5.5.133:43484 6 (0->2)
(Gen: 10, K(nh):5, Action:N(SD), Flags:, TCP:S, QOS:-1, S(nh):10, Stats:3/180,
root@5b4s10:~# flow --get 508368
Flow Index: 508368
Flow Generation ID: 10
Reverse Flow Index: 187416
VRF: 0
Destination VRF: 2
Flow Source: [5.5.5.133]:43484
Flow Destination: [169.254.
Flow Protocol: TCP
Flow Action: NAT: SourceNAT, DestinationNAT,
Expected Source: NextHop(Index, VRF, Type): 10, 1, RECEIVE
Source Information: VRF: 0
Destination Information: VRF: 2
Flow Flags:
TCP FLAGS: SYN,
UDP Source Port: 56500
Flow Statistics: 3/180
System Wide Packet Drops: 2045631
root@5b4s10:~# dropstats | grep -v ' 0$' | grep -v '^$' | grep -v Cloned | grep -v Duplicated
Discards 446
Invalid NH 3
No L2 Route 10
root@5b4s10:~# dropstats | grep -v ' 0$' | grep -v '^$' | grep -v Cloned | grep -v Duplicated
Discards 449
Invalid NH 3
No L2 Route 10
root@5b4s10:~# dropstats | grep -v ' 0$' | grep -v '^$' | grep -v Cloned | grep -v Duplicated
Discards 452
Invalid NH 3
No L2 Route 10
tags: | added: blocker |
Issue is when policy is enabled on vhost interface (in latest 4.1 builds, policy on vhost interface is disabled by default and this problem will not be seen). Need to make the health check packets go in this scenario.