[Openshift/K8S] : When configuring SNAT router host losing connectivity

Bug #1735590 reported by chhandak
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Fix Committed
High
Yuvaraja Mariappan
R5.0
Fix Committed
High
Yuvaraja Mariappan
Trunk
Fix Committed
High
Yuvaraja Mariappan

Bug Description

When we crate SNAT router and extend cluster-network to that SNAT router host is losing all connectivity.

In 4.1 kube-manager create a policy between IP Fabric network and cluster network by default. Now while extending this to a SNAT router all underlay traffic is dropped. Host is losing connectivity

Workaround: Disassociating ip-fabric-cluster-network-default policy and delete the same solves the problem. So if SNAT feature has to be used in contrail first ip-fabric-cluster-network-default policy should be disassociated and deleted

chhandak (chhandak)
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Sachchidanand Vaidya (vaidyasd)
milestone: none → r4.1.0.0-fcs
information type: Proprietary → Public
summary: - [Openshift] : When configuring SNAT router host loosing connectivity
+ [Openshift/K8S] : When configuring SNAT router host loosing
+ connectivity
Revision history for this message
Jeba Paulaiyan (jebap) wrote : Re: [Openshift/K8S] : When configuring SNAT router host loosing connectivity

Releasenotes:

In Kubernetes and Openshift based deployments when we crate SNAT router and extend cluster-network to that SNAT router host is losing all connectivity.

Workaround: Disassociating ip-fabric-cluster-network-default policy and delete the same solves the problem. So if SNAT feature has to be used in contrail first ip-fabric-cluster-network-default policy should be disassociated and deleted

Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Occurrence of this issue is sporadic.
In the recent sanity run on CB build 5.0.0-58, the issue was not observed.
Following is the report:
http://10.204.216.50/Docs/logs/5.0.0-58_2017_12_12_00_57_55_1513026170.38/junit-noframes.html

But the issue was observed in CB build 5.0.0-59.

The effects of this failure is severe on sanity run.
Hence, I am commenting the SNAT test case from k8s suite till the fix is released.

Nischal Sheth (nsheth)
summary: - [Openshift/K8S] : When configuring SNAT router host loosing
+ [Openshift/K8S] : When configuring SNAT router host losing
connectivity
Revision history for this message
Michael Ng (acerinop) wrote :

May I know what is the implication of disassociating and removing the ip-fabric-cluster-network-default policy? What is the objective initially to implement this in v4.1? I'm encountering this issue too in 4.1 but wasn't away about this bug earlier on and implemented the workaround below:

- When provision the entire cluster with the primary NIC that vHost associated to and has default gateway configured; as soon as the SNAT router is enabled the NODE IP will be inaccessible and K8s master also declared the worker nodes are lost due to disconnection

- The workaround is remove the Default Gateway Configuration at the Host's NIC and proceed to provision the entire cluster. Once provisioning completed; manually add the "gateway =" option at the vRouter Agent configuration and with that enabling the SNAT router does not bring down the node

The drawback of the workaround that I'd done is that when the pod is restarted the "gateway =" option configuration will be lost since it was applied directly at the container

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/43327
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/43328
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/43329
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/43327
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/43328
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/43329
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/43329
Committed: http://github.com/Juniper/contrail-controller/commit/86ddbbe2fa62d7a028f1014b69a185879d90ce19
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 86ddbbe2fa62d7a028f1014b69a185879d90ce19
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon May 28 02:29:29 2018 -0700

Route updates for default route in ip_fabric vrf should not be done

In k8s, network policy is enabled between pod-network and ip-fabric
network. When logical router is enabled for pod-network for snat,
default route would be injected in pod-network. Due to the policy,
it is updated to ip-fabric vrf which inturn causes host unreachablity.

Some routes in ip-fabric vrf specific to nodes which have to be
protected from being updated by bgp peers. Added code to ignore
updates for default route, vhost route, vhost subnet route in
ip-fabric vrf.

Change-Id: I22bba1be6106896b07c7d07d95d810eebb079ea1
Closes-bug: #1735590

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/43328
Committed: http://github.com/Juniper/contrail-controller/commit/db0f7f780c834228c6fb9405d8712b47f46890a7
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit db0f7f780c834228c6fb9405d8712b47f46890a7
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon May 28 02:29:29 2018 -0700

Route updates for default route in ip_fabric vrf should not be done

In k8s, network policy is enabled between pod-network and ip-fabric
network. When logical router is enabled for pod-network for snat,
default route would be injected in pod-network. Due to the policy,
it is updated to ip-fabric vrf which inturn causes host unreachablity.

Some routes in ip-fabric vrf specific to nodes which have to be
protected from being updated by bgp peers. Added code to ignore
updates for default route, vhost route, vhost subnet route in
ip-fabric vrf.

Change-Id: I22bba1be6106896b07c7d07d95d810eebb079ea1
Closes-bug: #1735590

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/43327
Committed: http://github.com/Juniper/contrail-controller/commit/d6e6e55d9bfe95dbc3a18c2f9e34ada2306b6938
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit d6e6e55d9bfe95dbc3a18c2f9e34ada2306b6938
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon May 28 02:29:29 2018 -0700

Route updates for default route in ip_fabric vrf should not be done

In k8s, network policy is enabled between pod-network and ip-fabric
network. When logical router is enabled for pod-network for snat,
default route would be injected in pod-network. Due to the policy,
it is updated to ip-fabric vrf which inturn causes host unreachablity.

Some routes in ip-fabric vrf specific to nodes which have to be
protected from being updated by bgp peers. Added code to ignore
updates for default route, vhost route, vhost subnet route in
ip-fabric vrf.

Change-Id: I22bba1be6106896b07c7d07d95d810eebb079ea1
Closes-bug: #1735590

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.