[Kubernetes] cluster deployment fail

Bug #1613691 reported by Alex Kholkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
murano-apps
Fix Released
High
Artem Silenkov

Bug Description

Bug description:
Currently we have some problems with deploying of the kubernetes cluster. Sometimes deployment fails with the same error like in Actual results. It is not depend in some particular image(ubuntu or debian). The scenario down below shows the case when one of the env's deployment will fail for sure. Deployment of the single env with kubernetes cluster have the such problems sometimes.

Steps to reproduce:
1) Deploy devstack from master
2) Import Kubernetes cluster from a.o.o
3) go to Applications > Catalog > Environments
4) create 2 envs. Add Kubernetes cluster to both of them. One deploy with ubuntu14.04-x64-kubernetes.qcow2 and another one with debian8-x64-kubernetes.qcow2
5) deploy envs

Expected results:
both envs deployed successful

Actual results:
one of the env fail with error:

2016-08-16 12:00:59 — [murano.engine.system.agent.AgentException]: {'errorCode': 100, 'message': u'Script setup returned error code', 'extra': None, 'details': {u'stdout': u'Adding member kube-2 to etcd cluster\ncluster may be unhealthy: failed to list members\nMember kube-2 not added\nmember 607f0f2af09d2fbf is healthy: got healthy result from http://10.0.95.10:4001\ncluster is healthy\nMember kube-2 has been added\n\nAdding member kube-3 to etcd cluster\nmember 607f0f2af09d2fbf is healthy: got healthy result from http://10.0.95.10:4001\nmember fb561de2b41d86de is healthy: got healthy result from http://10.0.95.13:4001\ncluster is healthy\nMember kube-3 has been added\n\nAdding member gateway-1 to etcd cluster\nmember 607f0f2af09d2fbf is healthy: got healthy result from http://10.0.95.10:4001\nmember 98b9eb6df8b5a7f2 is healthy: got healthy result from http://10.0.95.5:4001\nmember fb561de2b41d86de is healthy: got healthy result from http://10.0.95.13:4001\ncluster is healthy\nMember gateway-1 not added. Reason: etcdserver: peerURL exists', u'stderr': None, u'exitCode': 1}, 'time': u'2016-08-16 11:59:27.414875'}

Reproducibility:
Always

Additional information:
Part of the murano-engine.log
https://paste.mirantis.net/show/2542/

Revision history for this message
Dmytro Dovbii (ddovbii) wrote :

etcd issues usually appear when your lab is slow. currently we are working on fix which makes etcd more stable on slow envs too

Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

@Dmytro, we have a lab with 24 VCPU, 64 GM RAM and SSD drives. Looks like this lab is not slow.

Changed in murano-apps:
status: Confirmed → In Progress
assignee: nobody → Artem Silenkov (asilenkov)
Revision history for this message
Dmytro Dovbii (ddovbii) wrote :

should be fixed by https://review.openstack.org/#/c/355796/
new images are also updated on a.o.o

could you please reverify?

Revision history for this message
Alex Kholkin (akholkin) wrote :

I made the same steps and got successful deployment.

Changed in murano-apps:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.