Juniper Openstack

all remaining kube-managers died after a HA failover

Bug #1712003 reported by Vedamurthy Joshi on 2017-08-21

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Juniper Openstack	Status tracked in Trunk
R4.0	Fix Committed	High	Yuvaraja Mariappan	Juniper Openstack r4.0.1.0 "r4.0.1.0"
Trunk	Fix Committed	High	Yuvaraja Mariappan	Juniper Openstack r4.1.0.0-fcs "r4.1"

Bug Description

R4.0.1.0 Continuous build 24 Ubuntu 16.04.2 with fixes for bug 1710744 and bug 1711274

nodec1/nodec2/nodec3 are the 3 controller nodes which also have kube-manager containers

On nodec3, all containers were stopped around 11:02:53 AM

docker stop contrail-kube-manager ; docker stop analytics ; docker stop analyticsdb ; docker stop controller

It was seen that the other two kube-managers died at that time.
Surprisingly, no tracebacks/errors are seen during this period

Logs will be in http://10.204.216.50/Docs/bugs/#

Restarting one of the kube-managers brought them up fine and zk election happened properly as well

Tags:

Vedamurthy Joshi (vedujoshi) on 2017-08-23

tags:

added: blocker

Sachchidanand Vaidya (vaidyasd) on 2017-08-28

Changed in juniperopenstack:
assignee:	Sachchidanand Vaidya (vaidyasd) → Yuvaraja Mariappan (ymariappan)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2017-08-28: [Review update] R4.0

Review in progress for https://review.opencontrail.org/34992
Submitter: Yuvaraja Mariappan

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2017-08-28: [Review update] master

Review in progress for https://review.opencontrail.org/34993
Submitter: Yuvaraja Mariappan

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2017-08-29: A change has been merged

Reviewed: https://review.opencontrail.org/34993
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/e473bcd61cc102fb1c31f09f42632e8e4a6fe032
Submitter: Zuul (<email address hidden>)
Branch: master

commit e473bcd61cc102fb1c31f09f42632e8e4a6fe032
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon Aug 28 09:55:52 2017 -0700

Fixed ha fail over issue for kube-managers

The restart policy should be 'always' in the
contrail-kube-manager.service. so that if
there is an exception happens in the zoo-keeper
client, systemd would take care of it

Change-Id: I7ab20dd7d6646e760f9fc452633efb776cd6b32f
Closes-bug: #1712003

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2017-08-29:

Reviewed: https://review.opencontrail.org/34992
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/6611036bc43dbaf259725de2cf08385082bc5e15
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 6611036bc43dbaf259725de2cf08385082bc5e15
Author: Yuvaraja Mariappan <email address hidden>
Date: Mon Aug 28 09:55:52 2017 -0700

Fixed ha fail over issue for kube-managers

The restart policy should be 'always' in the
contrail-kube-manager.service. so that if
there is an exception happens in the zoo-keeper
client, systemd would take care of it

Change-Id: I7ab20dd7d6646e760f9fc452633efb776cd6b32f
Closes-bug: #1712003

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.