Magnum cluster takes forever to create with status 'kube_masters create in progress'

Bug #1655007 reported by Murali Annamneni
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Magnum
New
Undecided
Unassigned

Bug Description

I am trying to create kubernetes cluster using magnum + heat + Fedora-Atomic image. But, heat stack creation takes forever. When i check the heat resource list, its blocked (create_in_progress) at "OS::Heat::ResourceGroup".
Compute instance created successfully and cinder volume attached to it.
I didn't see any error logs from neutron/nova_compute/cinder_volume.

Environment:
  OS: OracleLinux
  Container images : build locally from the kolla repo tag on 21-Dec-16)
  Deployment : Ansible (from kolla-ansible git tag on 22-Dec-16)
  virt_type : qemu

Below is the snippet of logs from heat_engine

---------------------heat_engine logs -----------------
(Same logs printed continuously)

2017-01-09 12:46:37.469 19 DEBUG heat.engine.scheduler [req-2579096a-5eb2-4d4c-a5e9-c9d867f4d7db - - - - -] Task create from ResourceGroup "kube_masters" Stack "k8s-cluster1-3oxprdrgdckv" [07920bf8-9fe9-4bca-8cc0-2bdb0807c193] running step /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/scheduler.py:215
2017-01-09 12:46:37.476 19 DEBUG heat.engine.scheduler [req-2579096a-5eb2-4d4c-a5e9-c9d867f4d7db - - - - -] Task create from ResourceGroup "kube_masters" Stack "k8s-cluster1-3oxprdrgdckv" [07920bf8-9fe9-4bca-8cc0-2bdb0807c193] sleeping _sleep /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/scheduler.py:156
2017-01-09 12:46:38.026 20 DEBUG heat.engine.scheduler [req-5e80c897-f884-4790-9631-f9e0bb0b0763 - - - - -] Task create from TemplateResource "0" Stack "k8s-cluster1-3oxprdrgdckv-kube_masters-xalkwzpanqnj" [80c6f39a-a0f8-484e-9179-564818299f64] running step /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/scheduler.py:215
2017-01-09 12:46:38.031 20 DEBUG heat.engine.scheduler [req-5e80c897-f884-4790-9631-f9e0bb0b0763 - - - - -] Task create from TemplateResource "0" Stack "k8s-cluster1-3oxprdrgdckv-kube_masters-xalkwzpanqnj" [80c6f39a-a0f8-484e-9179-564818299f64] sleeping _sleep /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/scheduler.py:156
2017-01-09 12:46:38.284 19 DEBUG heat.engine.scheduler [req-69b2dcbb-ab4f-40e9-947a-07941603e91c 781a6a66a5344f679a854302317faeb2 35115cb81a704edc8fddcef6ad094678 - - -] Task create from HeatWaitCondition "master_wait_condition" Stack "k8s-cluster1-3oxprdrgdckv-kube_masters-xalkwzpanqnj-0-ybf72tafbqxu" [894ec426-3e2c-4eb7-ac7f-158b3aff679f] running step /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/scheduler.py:215
2017-01-09 12:46:38.304 19 DEBUG heat.engine.scheduler [req-69b2dcbb-ab4f-40e9-947a-07941603e91c 781a6a66a5344f679a854302317faeb2 35115cb81a704edc8fddcef6ad094678 - - -] Task create from HeatWaitCondition "master_wait_condition" Stack "k8s-cluster1-3o

-------------output of heat resource-show k8s-cluster1-3oxprdrgdckv kube_masters--------------
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attributes | { |
| | "attributes": null, |
| | "refs": null, |
| | "refs_map": null, |
| | "removed_rsrc_list": [] |
| | } |
| creation_time | 2017-01-09T12:36:16Z |
| description | |
| links | http://10.196.232.132:8004/v1/35115cb81a704edc8fddcef6ad094678/stacks/k8s-cluster1-3oxprdrgdckv/07920bf8-9fe9-4bca-8cc0-2bdb0807c193/resources/kube_masters (self) |
| | http://10.196.232.132:8004/v1/35115cb81a704edc8fddcef6ad094678/stacks/k8s-cluster1-3oxprdrgdckv/07920bf8-9fe9-4bca-8cc0-2bdb0807c193 (stack) |
| | http://10.196.232.132:8004/v1/35115cb81a704edc8fddcef6ad094678/stacks/k8s-cluster1-3oxprdrgdckv-kube_masters-xalkwzpanqnj/80c6f39a-a0f8-484e-9179-564818299f64 (nested) |
| logical_resource_id | kube_masters |
| physical_resource_id | 80c6f39a-a0f8-484e-9179-564818299f64 |
| required_by | api_address_lb_switch |
| | etcd_address_lb_switch |
| resource_name | kube_masters |
| resource_status | CREATE_IN_PROGRESS |
| resource_status_reason | state changed |
| resource_type | OS::Heat::ResourceGroup |
| updated_time | 2017-01-09T12:36:16Z |

Revision history for this message
Murali Annamneni (murali.annamneni) wrote :
summary: - Magnum cluster takes forever to create with kube_masters create in
- progress
+ Magnum cluster takes forever to create with status 'kube_masters create
+ in progress'
Revision history for this message
yatin (yatinkarel) wrote :

@Murali, can you log into the booted instance and check the following status there:-

ping google.com (vm nodes should have access to internet for fetching docker images)
etcdctl member list
etcdctl ls
sudo docker images
sudo docker ps
kubectl get nodes
journalctl -fu kubelet
journalctl -fu docker
journalctl -fu flanneld

Also check for any Error in: /var/log/cloud-init-output.log

Revision history for this message
Murali Annamneni (murali.annamneni) wrote :
Download full text (3.2 KiB)

Hi Yatin,
Here is the things you mentioned (attached few logs files)-

1) etcdctl member list
7129f3c41a5b5d55: name=10.0.0.11 peerURLs=http://10.0.0.11:2380 clientURLs=http://10.0.0.11:2379 isLeader=true
-----------
2) etcdctl ls
/registry

-----------
3) sudo docker images (after a while docker service killed automatically)
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 9 months ago 746.9 kB
gcr.io/google_containers/hyperkube v1.2.0 1351a10bb162 10 months ago 316.6 MB
gcr.io/google_containers/podmaster 1.1 1c0e52b333ce 20 months ago 7.922 MB

4)sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7c187afb085a gcr.io/google_containers/hyperkube:v1.2.0 "/hyperkube controlle" 12 seconds ago Up 10 seconds k8s_kube-controller-manager.1cd73811_kube-controller-manager-10.0.0.11_kube-system_cc9923f9d6847f89e900f55e094ed6d3_7b58e6e5
f1bde8f30e28 gcr.io/google_containers/pause-amd64:3.0 "/pause" 14 seconds ago Up 11 seconds k8s_POD.d8dbe16c_kube-controller-manager-10.0.0.11_kube-system_cc9923f9d6847f89e900f55e094ed6d3_b234c669
1d8106f9106e gcr.io/google_containers/hyperkube:v1.2.0 "/hyperkube scheduler" 14 seconds ago Up 12 seconds k8s_kube-scheduler.bb39995_kube-scheduler-10.0.0.11_kube-system_a969cc18743219bc7e3754ca4550d872_983bb0a7
8427fd5c4bfa gcr.io/google_containers/pause-amd64:3.0 "/pause" 16 seconds ago Up 13 seconds k8s_POD.d8dbe16c_kube-scheduler-10.0.0.11_kube-system_a969cc18743219bc7e3754ca4550d872_06085f5d
d36856449959 gcr.io/google_containers/podmaster:1.1 "/podmaster --etcd-se" 16 seconds ago Up 14 seconds k8s_controller-manager-elector.2f137849_kube-podmaster-10.0.0.11_kube-system_0d345212ba86ecbb352a381030d3ecd3_75261a57
972b806da850 gcr.io/google_containers/podmaster:1.1 "/podmaster --etcd-se" 18 seconds ago Up 16 seconds k8s_scheduler-elector.790a6d1d_kube-podmaster-10.0.0.11_kube-system_0d345212ba86ecbb352a381030d3ecd3_11850a6d
3e0d14a84a18 gcr.io/google_containers/hyperkube:v1.2.0 "/hyperkube proxy --m" 22 seconds ago Up 20 seconds k8s_kube-proxy.9ab94563_kube-proxy-10.0.0.11_kube-system_e99c0f24b4bf247c4f2f17cc922df504_b19171c0
20c8901a2272 gcr.io/google_containers/pause-amd64:3.0 "/pause" 51 seconds ago Up 49 seconds k8s_POD.d8dbe16c_kube-podmaster-10.0.0.11_kube-system_0d345212ba86ecbb352a381030d3ecd3_c0b592be
7797800019af gcr.io/google_containers/pause-amd64:3.0 "/pause" 53 seconds ago Up 51 seconds k8s_POD.d8dbe16c_kube-proxy-10.0.0.11_kube-system_e99c0f24b4bf247c4f2f17c...

Read more...

Revision history for this message
ChanYiLin (j5111261112) wrote :

Hi,
I have encountered this before.
The reason of my situation was because after kube_master launching successfully and starting all service that kubernetes needs,
kube_master will start wc-notify.service to notify Heat that it has finished.
What wc-notify.service does is just curl heat api.
But if you configure endpoint api in host name such as http://hostname:8004/v1/%\(tenant_id\)s,
the curl command in wc-notify.service will also use hostname which kube_master doesn't know what it refers to try to curl heat api.
Thus, wc-notify.service will failed and kube_master will stay in CREATE_IN_PROGRESS forever.

There are two solutions now.
First, manually add hostname and ip into /etc/hosts in kube_master.
Second, change the configuration of heat and its endpoint api.

Also, kube_minion encounters the same problem, and it use /var/lib/cloud/instance/scripts/part-014 this script to curl heat api.

Hope my comment helps.

Revision history for this message
Christian Zunker (christian-zunker) wrote :

I had the same problem with a k8s cluster. In my case, it turned out, this was the problem:
https://bugs.launchpad.net/magnum/+bug/1744362

These two helped me fix the problem:
https://ask.openstack.org/en/question/102214/software-deployment-in-heat-problem-with-os-collect-config/
https://bugs.launchpad.net/kolla-ansible/+bug/1762754

The problem was actually in heat config, not magnum.

Revision history for this message
panticz.de (panticz.de) wrote :

Check if if there any issues in your deployment process. Log in to the master node and debug systemd:

systemctl list-units --failed
journalctl -f
systemctl status kubelet.service

Check that your flavor has no swap and some additional parameters for magnum and heat:

# magnum.conf
[cinder]
default_docker_volume_type = VT1

[trust]
cluster_user_trust = True

# heat.conf
[DEFAULT]
region_name_for_services = ch-zh1

If there no issues then check the kube-system pods:
kubectl get pods --all-namespaces

Revision history for this message
Debasis (debamondal) wrote :

HI, I'm facing the same issue! Could someone help me understand how to login into the master node? I mean what will be the user name and password? I tried with core as user name with core as password!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.