k8s_5.0: controller node reboot fails to re establish the bgp XMPP connection with compute nodes

Bug #1766035 reported by Venkatesh Velpula on 2018-04-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
Critical
alexey-mr
Trunk
Fix Released
Critical
alexey-mr

Bug Description

workaround is to clear the iptables on the master

build :5.0.25
deployer :ansiible-deployer

setup:
controller/master :nodei24
compute/minion . :nodei25

on compute after reboot of master
========================================================
[root@nodei25 ~]# contrail-status
Pod Service Original Name State Status
vrouter agent contrail-vrouter-agent running Up 2 hours
vrouter nodemgr contrail-nodemgr running Up 2 hours

vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: initializing (Collector connection down)
agent: initializing (XMPP:control-node:10.204.217.136, XMPP:dns-server:10.204.217.136, Collector connection down)

============================================================
vrouter agent logs
============================================================

018-04-22 Sun 04:19:53:295.646 IST nodei25 [Thread 140402294712064, Pid 28813]: SANDESH: Send FAILED: 1524350993291469 [SYS_NOTICE]: NodeStatusUVE: data= [ name = nodei25 process_status= [ [ [ module_id = contrail-vrouter-agent instance_id = 0 state = Non-Functional connection_infos= [ [ [ type = XMPP name = control-node:10.204.217.136 server_addrs= [ [ (*_iter6) = 10.204.217.136:5269, ] ] status = Down description = Connect ], [ type = XMPP name = dns-server:10.204.217.136 server_addrs= [ [ (*_iter6) = 10.204.217.136:53, ] ] status = Down description = Connect ], [ type = Collector name = server_addrs= [ [ (*_iter6) = 10.204.217.136:8086, ] ] status = Initializing description = Idle : EvIdleHoldTimerExpired -> Connect ], ] ] description = XMPP:control-node:10.204.217.136, XMPP:dns-server:10.204.217.136, Collector connection down ], ] ] ]

==============================================================
iptable rules after reboot of the master
===============================================================
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
INPUT_direct all -- anywhere anywhere
INPUT_ZONES_SOURCE all -- anywhere anywhere
INPUT_ZONES all -- anywhere anywhere
DROP all -- anywhere anywhere ctstate INVALID
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

Chain FORWARD (policy DROP)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forward rules */
DOCKER-ISOLATION all -- anywhere anywhere
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
FORWARD_direct all -- anywhere anywhere
FORWARD_IN_ZONES_SOURCE all -- anywhere anywhere
FORWARD_IN_ZONES all -- anywhere anywhere
FORWARD_OUT_ZONES_SOURCE all -- anywhere anywhere
FORWARD_OUT_ZONES all -- anywhere anywhere
DROP all -- anywhere anywhere ctstate INVALID
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
OUTPUT_direct all -- anywhere anywhere

Chain DOCKER (1 references)
target prot opt source destination

Chain DOCKER-ISOLATION (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere

Chain FORWARD_IN_ZONES (1 references)
target prot opt source destination
FWDI_public all -- anywhere anywhere [goto]
FWDI_public all -- anywhere anywhere [goto]

Chain FORWARD_IN_ZONES_SOURCE (1 references)
target prot opt source destination

Chain FORWARD_OUT_ZONES (1 references)
target prot opt source destination
FWDO_public all -- anywhere anywhere [goto]
FWDO_public all -- anywhere anywhere [goto]

Chain FORWARD_OUT_ZONES_SOURCE (1 references)
target prot opt source destination

Chain FORWARD_direct (1 references)
target prot opt source destination

Chain FWDI_public (2 references)
target prot opt source destination
FWDI_public_log all -- anywhere anywhere
FWDI_public_deny all -- anywhere anywhere
FWDI_public_allow all -- anywhere anywhere
ACCEPT icmp -- anywhere anywhere

Chain FWDI_public_allow (1 references)
target prot opt source destination

Chain FWDI_public_deny (1 references)
target prot opt source destination

Chain FWDI_public_log (1 references)
target prot opt source destination

Chain FWDO_public (2 references)
target prot opt source destination
FWDO_public_log all -- anywhere anywhere
FWDO_public_deny all -- anywhere anywhere
FWDO_public_allow all -- anywhere anywhere

Chain FWDO_public_allow (1 references)
target prot opt source destination

Chain FWDO_public_deny (1 references)
target prot opt source destination

Chain FWDO_public_log (1 references)
target prot opt source destination

Chain INPUT_ZONES (1 references)
target prot opt source destination
IN_public all -- anywhere anywhere [goto]
IN_public all -- anywhere anywhere [goto]

Chain INPUT_ZONES_SOURCE (1 references)
target prot opt source destination

Chain INPUT_direct (1 references)
target prot opt source destination

Chain IN_public (2 references)
target prot opt source destination
IN_public_log all -- anywhere anywhere
IN_public_deny all -- anywhere anywhere
IN_public_allow all -- anywhere anywhere
ACCEPT icmp -- anywhere anywhere

Chain IN_public_allow (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh ctstate NEW

Chain IN_public_deny (1 references)
target prot opt source destination

Chain IN_public_log (1 references)
target prot opt source destination

Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000

Chain KUBE-SERVICES (2 references)
target prot opt source destination
REJECT udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable

Chain OUTPUT_direct (1 references)
target prot opt source destination

tags: added: contrail-control
removed: vrouter
Jeba Paulaiyan (jebap) on 2018-04-22
tags: added: releasenote
Nikhil Bansal (nikhilb-u) wrote :

I don't see the system in this state. Also, I dont see any controller logs info as well. Did you collect controller xmpp traces?

Venkatesh Velpula (vvelpula) wrote :

HI Nikhli ... setup is in problem state

Nikhil Bansal (nikhilb-u) wrote :

It seems that the behavior is different in docker restart vs node reboot. We are not even able to access introspect page of control node after node restart. Somewhere iptables flushing is not happening. Maybe someone from provisioning may look further into it.

Review in progress for https://review.opencontrail.org/42603
Submitter: alexey-mr (<email address hidden>)

Review in progress for https://review.opencontrail.org/42604
Submitter: alexey-mr (<email address hidden>)

Review in progress for https://review.opencontrail.org/42605
Submitter: alexey-mr (<email address hidden>)

Review in progress for https://review.opencontrail.org/42606
Submitter: alexey-mr (<email address hidden>)

Review in progress for https://review.opencontrail.org/42605
Submitter: alexey-mr (<email address hidden>)

Review in progress for https://review.opencontrail.org/42606
Submitter: alexey-mr (<email address hidden>)

Review in progress for https://review.opencontrail.org/42605
Submitter: Andrey Pavlov (<email address hidden>)

Review in progress for https://review.opencontrail.org/42606
Submitter: Andrey Pavlov (<email address hidden>)

Reviewed: https://review.opencontrail.org/42606
Committed: http://github.com/Juniper/contrail-ansible-deployer/commit/73ce1dec0a5eb8234ba1cec561e4a15c04b890fb
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit 73ce1dec0a5eb8234ba1cec561e4a15c04b890fb
Author: alexey-mr <email address hidden>
Date: Fri Apr 27 22:29:03 2018 +0300

Disable firewall service

This is to keep setup workign
in case of reboot.

Change-Id: I6f741daa5e52ba71f67cfd15333e42fda8d29a50
Closes-Bug: #1766035

Reviewed: https://review.opencontrail.org/42603
Committed: http://github.com/Juniper/contrail-container-builder/commit/d8ec3572a56a8d8bac2b9c68034d1b33142938a2
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit d8ec3572a56a8d8bac2b9c68034d1b33142938a2
Author: alexey-mr <email address hidden>
Date: Fri Apr 27 22:07:48 2018 +0300

Disable firewalld/ufw in setup-k8s.sh

Change-Id: I14a4ec3c6dc4c6a0ebb8170af0c96ab9fe9d419c
Partial-Bug: #1766035

Reviewed: https://review.opencontrail.org/42605
Committed: http://github.com/Juniper/contrail-ansible-deployer/commit/00029323f2918877ddc4246a4512d33fa8d21a7b
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 00029323f2918877ddc4246a4512d33fa8d21a7b
Author: alexey-mr <email address hidden>
Date: Fri Apr 27 22:29:03 2018 +0300

Disable firewall service

This is to keep setup workign
in case of reboot.

Change-Id: I6f741daa5e52ba71f67cfd15333e42fda8d29a50
Closes-Bug: #1766035

Reviewed: https://review.opencontrail.org/42604
Committed: http://github.com/Juniper/contrail-container-builder/commit/f2845420f834c0449c9e0334b4ab54a52b632fd4
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit f2845420f834c0449c9e0334b4ab54a52b632fd4
Author: alexey-mr <email address hidden>
Date: Fri Apr 27 22:07:48 2018 +0300

Disable firewalld/ufw in setup-k8s.sh

Change-Id: I14a4ec3c6dc4c6a0ebb8170af0c96ab9fe9d419c
Partial-Bug: #1766035

Venkatesh Velpula (vvelpula) wrote :

verified on ocata-master-174

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers