[K8s] Build 4.0.0.0-3041: Service in user created namespace: Load balancing is not happening across all member

Bug #1670877 reported by chhandak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Invalid
Critical
Ritesh

Bug Description

Description:
When we create a service in user created namespace, load balancing is not happening across all the members.

FIP is associated with all the member interfaces as per agent inspostect, but nh --get for VIP only result in one of the member interfaces instead of composite next hop .

Also, traffic results indicating the same. It always choose single backend which pointed by vrouter

Restart of agent has solved the problem

[root@kube-system-1 ~]# kubectl get namespace
NAME STATUS AGE
default Active 10d
development Active 6h
kube-system Active 10d

[root@kube-system-1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-dev-3c71m 1/1 Running 0 2h
nginx-dev-ch3sk 1/1 Running 0 2h
nginx-dev-x312g 1/1 Running 0 2h
ubuntuapp-3 1/1 Running 0 9m

[root@kube-system-1 ~]# kubectl describe service my-service-dev
Name: my-service-dev
Namespace: development
Labels: <none>
Selector: app=nginx
Type: ClusterIP
IP: 10.97.93.86
Port: <unset> 80/TCP
Endpoints: 10.47.255.243:80,10.47.255.244:80,10.47.255.246:80
Session Affinity: None

FIP associated to all the interface

http://10.87.120.43:8085/Snh_ItfReq?name=&type=&uuid=&vn=&mac=&ipv4_address=&ipv6_address=&parent_uuid=&ip_active=&ip6_active=&l2_active=

root@kube-system-2(agent):/# rt --dump 1 | grep 10.97.93.86
10.97.93.86/32 32 P - 49 -
root@kube-system-2(agent):/# nh --get 49
Id:49 Type:Encap Fmly: AF_INET Rid:0 Ref_cnt:5 Vrf:1
              Flags:Valid, Policy, Etree Root,
              EncapFmly:0806 Oif:11 Len:14
              Encap Data: 02 f2 bc 31 56 02 00 00 5e 00 01 00 08 00

root@kube-system-2(agent):/# vif --get 11
Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
       Proxy=MAC Requests Proxied Always, Er=Etree Root

vif0/11 OS: cn-6754
            Type:Virtual HWaddr:00:00:5e:00:01:00 IPaddr:0
            Vrf:1 Flags:PL3DEr MTU:9160 QOS:-1 Ref:5
            RX packets:1097 bytes:46522 errors:0
            TX packets:2203 bytes:93250 errors:0
            Drops:1095

root@kube-system-2(agent):/# flow -l
Flow table(size 80609280, entries 629760)

Entries: Created 4 Added 4 Deleted 0 Changed 0 Processed 4 Used Overflow entries 0
(Created Flows/CPU: 1 1 2 0 0 0 0 0)(oflows 0)

Action:F=Forward, D=Drop N=NAT(S=SNAT, D=DNAT, Ps=SPAT, Pd=DPAT, L=Link Local Port)
 Other:K(nh)=Key_Nexthop, S(nh)=RPF_Nexthop
 Flags:E=Evicted, Ec=Evict Candidate, N=New Flow, M=Modified Dm=Delete Marked
TCP(r=reverse):S=SYN, F=FIN, R=RST, C=HalfClose, E=Established, D=Dead

    Index Source:Port/Destination:Port Proto(V)
-----------------------------------------------------------------------------------
      900<=>94460 10.47.255.242:47546 6 (1->1)
                         10.97.93.86:80
(Gen: 10, K(nh):58, Action:N(D), Flags:, TCP:SSrEEr, E:0, QOS:-1, S(nh):58,
 Stats:2/140, SPort 52720, TTL 0, Sinfo 12.0.0.0)

    94460<=>900 10.47.255.243:80 6 (1->1) >>>>>> Always result in this backend. Vrouter is also pointing to this backend
                         10.47.255.242:47546
(Gen: 11, K(nh):49, Action:N(S), Flags:, TCP:SSrEEr, E:0, QOS:-1, S(nh):49,
 Stats:1/74, SPort 65486, TTL 0, Sinfo 11.0.0.0)

root@kube-system-2(agent):/# service contrail-vrouter-agent restart
contrail-vrouter-agent: stopped
contrail-vrouter-agent: started
root@kube-system-2(agent):/#
root@kube-system-2(agent):/#
root@kube-system-2(agent):/# rt --dump 1 | grep 10.97.93.86
10.97.93.86/32 32 P - 54 -
root@kube-system-2(agent):/# nh --get 54
Id:54 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1
              Flags:Valid, Policy, Ecmp, Etree Root,
              Valid Hash Key Parameters: Proto,SrcIP,SrcPort,DstIp,DstPort
              Sub NH(label): 15(123) 27(124) 35(122)

Id:15 Type:Encap Fmly: AF_INET Rid:0 Ref_cnt:3 Vrf:1
              Flags:Valid, Etree Root,
              EncapFmly:0806 Oif:5 Len:14
              Encap Data: 02 f2 bc 31 56 02 00 00 5e 00 01 00 08 00

Id:27 Type:Encap Fmly: AF_INET Rid:0 Ref_cnt:3 Vrf:1
              Flags:Valid, Etree Root,
              EncapFmly:0806 Oif:7 Len:14
              Encap Data: 02 f2 1e 37 8a 02 00 00 5e 00 01 00 08 00

Id:35 Type:Encap Fmly: AF_INET Rid:0 Ref_cnt:3 Vrf:1
              Flags:Valid, Etree Root,
              EncapFmly:0806 Oif:9 Len:14
              Encap Data: 02 f2 6a 8d 24 02 00 00 5e 00 01 00 08 00

Revision history for this message
chhandak (chhandak) wrote :

Copied logs to /auto/cores/1670877

Changed in juniperopenstack:
importance: Undecided → Critical
assignee: nobody → Hari Prasad Killi (haripk)
milestone: none → r4.0
information type: Proprietary → Public
Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → jayaramsatya (jayaramsatya)
Revision history for this message
jayaramsatya (jayaramsatya) wrote :

Seems it is private build Core file not matching Agent binary in 3041.

{"build-info":[{"build-time":"2017-02-24 22:22:48.905957","build-hostname":"ubuntu-build04","build-user":"ymariappan","build-version":"4.0.0.0","build-id":"4.0.0.0-3041","build-number":"3041"}]}

Tried recreating the issues with 3049 not reproducible.

Revision history for this message
chhandak (chhandak) wrote :

Copied the binary to the same location. Could able to decode the gcore file in the following testbed.
ssh root@10.87.120.16(roo/c0ntrail123) and then docker exec -it agent bash.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00002abab10992db in ?? () from /usr/lib/libtbb.so.2
#2 0x00002abab10992f9 in ?? () from /usr/lib/libtbb.so.2
#3 0x00002abab0e6b184 in start_thread (arg=0x2abab8e00700) at pthread_create.c:312
#4 0x00002abab1bcf37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) info threads
  Id Target Id Frame
  9 Thread 0x2abaaf27b480 (LWP 10697) 0x00002abab1bcfa13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
  8 Thread 0x2ababaa07700 (LWP 10720) 0x00002abab0e723ad in read () at ../sysdeps/unix/syscall-template.S:81
  7 Thread 0x2ababa606700 (LWP 10719) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  6 Thread 0x2abab9e04700 (LWP 10717) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  5 Thread 0x2ababa205700 (LWP 10718) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  4 Thread 0x2abab9a03700 (LWP 10716) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  3 Thread 0x2abab9602700 (LWP 10715) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  2 Thread 0x2abab9201700 (LWP 10714) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
* 1 Thread 0x2abab8e00700 (LWP 10713) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

-rwxrwxrwx 1 chhandak epbg 260081856 Mar 7 15:06 core.10697
-rwxrwxrwx 1 chhandak epbg 41458 Mar 7 15:07 contrail-lbaas-haproxy-stdout.log
-rwxrwxrwx 1 chhandak epbg 1854 Mar 7 15:07 contrail-vrouter-agent-stdout.log
-rwxrwxrwx 1 chhandak epbg 988164 Mar 7 15:07 contrail-vrouter-agent.log
-rwxrwxrwx 1 chhandak epbg 1048814 Mar 7 15:07 contrail-vrouter-agent.log.1
-rwxrwxrwx 1 chhandak epbg 1048792 Mar 7 15:07 contrail-vrouter-agent.log.2
-rwxrwxrwx 1 chhandak epbg 1048814 Mar 7 15:07 contrail-vrouter-agent.log.3
-rwxrwxrwx 1 chhandak epbg 1048814 Mar 7 15:07 contrail-vrouter-agent.log.4
-rwxrwxrwx 1 chhandak epbg 1048822 Mar 7 15:07 contrail-vrouter-agent.log.5
-rwxrwxrwx 1 chhandak epbg 29558098 Mar 7 15:07 contrail-vrouter-nodemgr-stderr.log
-rwxrwxrwx 1 chhandak epbg 618 Mar 7 15:07 contrail-vrouter-nodemgr-stdout.log
-rwxrwxrwx 1 chhandak epbg 3429 Mar 7 15:07 supervisord-vrouter.log
-rwxrwxrwx 1 chhandak epbg 229919445 Mar 29 20:46 contrail-vrouter-agent-41---->Binary.

Revision history for this message
Ashish Ranjan (aranjan-n) wrote :

Pl triage this

Revision history for this message
Hari Prasad Killi (haripk) wrote :

The copied binary doesnt match the core file.

core:
{"build-info":[{"build-time":"2017-02-24 22:22:48.905957","build-hostname":"ubuntu-build04","build-user":"ymariappan","build-version":"4.0.0.0","build-id":"4.0.0.0-3041","build-number":"3041"}]}

from contrail-vrouter-agent-41 :
{"build-info": [{"build-time": "2017-03-29 18:43:11.062746", "build-hostname": "ubuntu-build04", "build-user": "ymariappan", "build-version": "4.0.0.0"}]}

Marking the bug as incomplete till we either get the matching binary / recreate it using regular build.

Revision history for this message
chhandak (chhandak) wrote :

Not seeing the issue on the recent build. Closing the bug for now

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.