[k8s-nested-R5.0]: Unstable k8s cluster in nested mode provisioning. Pod creation often fails and workaround is kube-manager restart

Bug #1760202 reported by Pulkit Tandon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
Critical
Dinesh Bakiaraj
Trunk
Fix Released
Critical
Dinesh Bakiaraj

Bug Description

Configuration:
K8s 1.9.2
contrail-5.0.0-25
Centos-7.4

Nested Mode:
Setup:
1 Control + Openstack node
2 Compute nodes
3 VM created. using nova on compute nodes.
1 VM as K8s master + Contrail Kube Manager
2 VMs as k8s Slave + contrail CNI

Description:
Pod creation often fails.
Workaround is restart of kube manager.
After sometime of restart, pod creation again fails.

Following are few logs:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/kube_manager/vnc/vnc_kubernetes.py", line 518, in vnc_process
    self.pod_mgr.process(event)
  File "/usr/lib/python2.7/site-packages/kube_manager/vnc/vnc_pod.py", line 578, in process
    pod_namespace, pod_node, node_ip, labels, vm_vmi)
  File "/usr/lib/python2.7/site-packages/kube_manager/vnc/vnc_pod.py", line 427, in vnc_pod_update
    pod_node, node_ip, labels, vm_vmi)
  File "/usr/lib/python2.7/site-packages/kube_manager/vnc/vnc_pod.py", line 407, in vnc_pod_add
    self._create_iip(pod_name, pod_namespace, vn_obj, vmi)
  File "/usr/lib/python2.7/site-packages/kube_manager/vnc/vnc_pod.py", line 201, in _create_iip
    self._vnc_lib.instance_ip_create(iip_obj)
  File "/usr/lib/python2.7/site-packages/vnc_api/vnc_api.py", line 48, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vnc_api/vnc_api.py", line 520, in _object_create
    OP_POST, obj_cls.create_uri, data=json_body)
  File "/usr/lib/python2.7/site-packages/vnc_api/vnc_api.py", line 943, in _request_server
    retry_after_authn=retry_after_authn, retry_count=retry_count)
  File "/usr/lib/python2.7/site-packages/vnc_api/vnc_api.py", line 1050, in _request
    raise BadRequest(status, content)
BadRequest: Virtual-Network(['default-domain', 'admin', 'k8s-default-pod-network']) has no defined subnets

Pulkit Tandon (pulkitt)
tags: added: sanityblocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/41293
Submitter: Dinesh Bakiaraj (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/41433
Submitter: Dinesh Bakiaraj (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/41293
Committed: http://github.com/Juniper/contrail-controller/commit/008afd1eccdbde154f20a1baf0a4fd2f5588cd07
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 008afd1eccdbde154f20a1baf0a4fd2f5588cd07
Author: dineshb-jnpr <email address hidden>
Date: Mon Apr 2 19:48:59 2018 -0700

Avoid rabbit flow timeout in k8s Nested mode.

In K8s nested mode, kube-manager in overlay connects to the contrail
control plane in underlay via link local service. TCP connections to
rabbit service may timeout if there is no activity on the link local
connection. However tcp connection tear down over link local does not
bring down connection on both ends, leaving connections dangling.
This code change is to prevent the flow timeout of the rabbit tcp
connection from kube-manager, in K8s nested mode.

Change-Id: I8e012bb7da76e368c0f42cb9d461d0faa7c003eb
Closes-bug: #1760202

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/41433
Committed: http://github.com/Juniper/contrail-controller/commit/93e8474d2ad5c14dffd3bee6987e20ed7a5f723d
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit 93e8474d2ad5c14dffd3bee6987e20ed7a5f723d
Author: dineshb-jnpr <email address hidden>
Date: Mon Apr 2 19:48:59 2018 -0700

Avoid rabbit flow timeout in k8s Nested mode.

In K8s nested mode, kube-manager in overlay connects to the contrail
control plane in underlay via link local service. TCP connections to
rabbit service may timeout if there is no activity on the link local
connection. However tcp connection tear down over link local does not
bring down connection on both ends, leaving connections dangling.
This code change is to prevent the flow timeout of the rabbit tcp
connection from kube-manager, in K8s nested mode.

Change-Id: I8e012bb7da76e368c0f42cb9d461d0faa7c003eb
Closes-bug: #1760202

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Verified on build 5.0-16.
No such issue was observed.
Hence closing the bug on R5.0 branch

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Recently provisioned with:
ocata-5.0-122
ocata-master-185
Hence closing the bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.