all remaining kube-managers died after a HA failover

Bug #1712003 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Committed
High
Yuvaraja Mariappan
Trunk
Fix Committed
High
Yuvaraja Mariappan

Bug Description

R4.0.1.0 Continuous build 24 Ubuntu 16.04.2 with fixes for bug 1710744 and bug 1711274

nodec1/nodec2/nodec3 are the 3 controller nodes which also have kube-manager containers

On nodec3, all containers were stopped around 11:02:53 AM

docker stop contrail-kube-manager ; docker stop analytics ; docker stop analyticsdb ; docker stop controller

It was seen that the other two kube-managers died at that time.
Surprisingly, no tracebacks/errors are seen during this period

Logs will be in http://10.204.216.50/Docs/bugs/#

Restarting one of the kube-managers brought them up fine and zk election happened properly as well

tags: added: blocker
Changed in juniperopenstack:
assignee: Sachchidanand Vaidya (vaidyasd) → Yuvaraja Mariappan (ymariappan)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/34992
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/34993
Submitter: Yuvaraja Mariappan

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/34993
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/e473bcd61cc102fb1c31f09f42632e8e4a6fe032
Submitter: Zuul (<email address hidden>)
Branch: master

commit e473bcd61cc102fb1c31f09f42632e8e4a6fe032
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon Aug 28 09:55:52 2017 -0700

Fixed ha fail over issue for kube-managers

The restart policy should be 'always' in the
contrail-kube-manager.service. so that if
there is an exception happens in the zoo-keeper
client, systemd would take care of it

Change-Id: I7ab20dd7d6646e760f9fc452633efb776cd6b32f
Closes-bug: #1712003

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34992
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/6611036bc43dbaf259725de2cf08385082bc5e15
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 6611036bc43dbaf259725de2cf08385082bc5e15
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon Aug 28 09:55:52 2017 -0700

Fixed ha fail over issue for kube-managers

The restart policy should be 'always' in the
contrail-kube-manager.service. so that if
there is an exception happens in the zoo-keeper
client, systemd would take care of it

Change-Id: I7ab20dd7d6646e760f9fc452633efb776cd6b32f
Closes-bug: #1712003

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.