K8s cluster deployment is stuck at kube_cluster_deploy OS::Heat::SoftwareDeployment

Bug #1846476 reported by BN
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Invalid
Undecided
Unassigned

Bug Description

What happened: After following different guides to deploy k8s cluster via magnum, cluster is always stuck at "Create in progress" and after timeout, fails at kube_cluster_deploy OS::Heat::SoftwareDeployment

What you expected to happen: k8s Cluster shall be successfully deployed.

How to reproduce it (minimal and precise): I tried 3 different guides:
https://cloudbase.it/easily-deploy-a-kubernetes-cluster-on-openstack/
http://www.panticz.de/magnum
https://docs.openstack.org/magnum/latest/user/

**Environment**:
* OS (e.g. from /etc/os-release): Ubuntu
* Kernel (e.g. `uname -a`): Linux host 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
* Docker version if applicable (e.g. `docker version`): 19.03.2
* Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release): pip 19.2.3
* Docker image Install type (source/binary): source
* Docker image distribution: stein
* Are you using official images from Docker Hub or self built? official
* If self built - Kolla version and environment used to build:
* Share your inventory file, globals.yml and other configuration files if relevant

------------------------------------------------------------------------------------------------------
I have attached globals, heat & magnum conf files.

Thank you

Revision history for this message
BN (zatoichy) wrote :
Revision history for this message
Mark Goddard (mgoddard) wrote :

Have you set enable_cluster_user_trust to true in magnum.conf? There was a bug in magnum where deploys would fail without it. It has been fixed and backported to stein recently. See https://bugs.launchpad.net/kolla-ansible/+bug/1842449.

Revision history for this message
BN (zatoichy) wrote :

Hi Mark,

It is set to true in magnum.conf. Latest version of stein is used as well.

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

Ok, can you please post outputs about the state of the heat stack?

heat stack-list -n | grep cluster-name
heat resource-list failed-stack-name | grep “FAILED”
heat resource-show failed-stack-name failed-resource-name

Revision history for this message
BN (zatoichy) wrote :

http://prntscr.com/pty2dp - using the latest fedora_atomic image downloaded from official website.

Revision history for this message
BN (zatoichy) wrote :
Revision history for this message
BN (zatoichy) wrote :

Hi Michal,

I am still struggling to get my magnum k8s cluster working. There is heat output you have asked: https://pastebin.com/raw/maLabMvj

Basically, HEAT fails at master SoftwareDeploy phase - https://pastebin.com/QxMeBcAA

Magnum show similar error: 2020-04-03 23:46:20.823 6 ERROR magnum.drivers.heat.driver [req-93b5b5a6-54d0-4159-8329-86d3ef4b7b52 - - - - -] Nodegroup error, stack status: CREATE_FAILED, stack_id: cb1b86df-6d47-409a-bdd7-8f095cb2bd98, reason: Resource CREATE failed: Error: resources.kube_masters.resources[0].resources.master_config_deployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1

P.S.

Kolla-ansible: 9.0, ubuntu-sources, train

# cat /etc/kolla/config/magnum.conf
[cinder]
default_docker_volume_type = _DEFAULT_

[trust]
cluster_user_trust = True

/etc/kolla/config/heat.conf
[DEFAULT]
region_name_for_services = RegionOne

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

So that probably means some scripts run by heat on the provisioned servers failed - can you check cloud-init and heat agent logs on the instances?

Revision history for this message
BN (zatoichy) wrote :
Revision history for this message
Deepa (dpaclt) wrote :

I am also facing same issue ..Is this fixed ?

Revision history for this message
Till Plüer (tplueer) wrote :

Iam currently facing the same issue. I think the issue has something to do with the auth url parameter which passed to the heat stack.

In the keystone-apache-public-access.log logs i can see that following endpoint is called during the deployment but it returns 404

POST /auth/tokens HTTP/1.1" 404 232 10372 "-" "curl/7.69.1"

Can you check the auth url parameter in your heat stack?
Mine is set to: https://xxxxxxxxx:5000

/v3/auth/tokens seems to work.

Revision history for this message
Satish Patel (satish-txt) wrote :

I am having same issue with Usurri deployment, my cluster stack getting hang on kube_cluster_deploy

http://paste.openstack.org/show/796895/

Did you guys find any solution?

Revision history for this message
Satish Patel (satish-txt) wrote :

This is what i found if you are using --tls-disabled then it will get stuck at kube_cluster_deploy, as soon as i removed --tls-disabled it works.

Revision history for this message
naman (naman1998) wrote :

i am also facing the same issue tried all the ways mentioned above

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

This bug should be raised in Magnum

Changed in kolla-ansible:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.