Bay stuck in CREATE_IN_PROGRESS

Bug #1611849 reported by Carolyn Van Slyck
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Magnum
New
Undecided
Unassigned

Bug Description

When I created a bay, it became stuck in the CREATE_IN_PROGRESS state. The heat stack is also stuck in CREATE_IN_PROGRESS. This has only happened once so far, and I don't know how to reproduce the bug.

The output for magnum bay-show and heat stack-show are in this gist
https://gist.github.com/carolynvs/ed5582086b58e00bd71d5c338c98ac0c

I didn't find any mention of the bay or stack id in the logs but I do see this error in the magnum-conductor.log file repeatedly.

2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic Traceback (most recent call last):
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic File "/openstack/venvs/magnum-13.1.4/local/lib/python2.7/site-packages/magnum/service/periodic.py", line 199, in _send_bay_metrics
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic monitor.pull_data()
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic File "/openstack/venvs/magnum-13.1.4/local/lib/python2.7/site-packages/magnum/conductor/k8s_monitor.py", line 44, in pull_data
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic self.data['nodes'] = self._parse_node_info(nodes)
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic File "/openstack/venvs/magnum-13.1.4/local/lib/python2.7/site-packages/magnum/conductor/k8s_monitor.py", line 161, in _parse_node_info
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic capacity = ast.literal_eval(node.status.capacity)
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic return _convert(node_or_string)
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic File "/usr/lib/python2.7/ast.py", line 79, in _convert
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic raise ValueError('malformed string')
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic ValueError: malformed string
2016-08-10 11:32:07.735 22169 ERROR magnum.service.periodic
2016-08-10 11:33:08.685 22169 WARNING magnum.common.cert_manager.local_cert_manager [req-e193a46c-4ffa-49ab-b551-b1ad1aa8115a - - - - -] Loading certificate 6bd0b927-9672-4c69-a65a-0623ee25d214 from the local filesystem. CertManager type 'local' should be used for testing purpose.
2016-08-10 11:33:08.686 22169 WARNING magnum.common.cert_manager.local_cert_manager [req-e193a46c-4ffa-49ab-b551-b1ad1aa8115a - - - - -] Loading certificate da577ec8-6dff-461d-bae7-97c4084b20f7 from the local filesystem. CertManager type 'local' should be used for testing purpose.
2016-08-10 11:33:08.730 22169 WARNING magnum.service.periodic [req-e193a46c-4ffa-49ab-b551-b1ad1aa8115a - - - - -] Skip pulling data from bay bfb318e7-7abd-4ea2-a9bd-1db1a5b562a7 due to error: malformed string

Here's the output for the bay mentioned in the error messages above:

magnum bay-show bfb318e7-7abd-4ea2-a9bd-1db1a5b562a7
+--------------------+------------------------------------------------------------+
| Property | Value |
+--------------------+------------------------------------------------------------+
| status | CREATE_COMPLETE |
| uuid | bfb318e7-7abd-4ea2-a9bd-1db1a5b562a7 |
| stack_id | df4cc585-020b-4e50-b0e8-bfc6a75678f3 |
| status_reason | Stack CREATE completed successfully |
| created_at | 2016-08-10T13:47:23+00:00 |
| updated_at | 2016-08-10T13:50:59+00:00 |
| bay_create_timeout | 60 |
| api_address | https://172.29.248.58:6443 |
| baymodel_id | 472807c2-f175-4946-9765-149701a5aba7 |
| master_addresses | ['172.29.248.58'] |
| node_count | 1 |
| node_addresses | ['172.29.248.59'] |
| master_count | 1 |
| discovery_url | https://discovery.etcd.io/53cd3d5bc175f6e366d189131660a969 |
| name | mycluster |
+--------------------+------------------------------------------------------------+

description: updated
Revision history for this message
hongbin (hongbin034) wrote :

Hi Carolyn,

Could you follow this guide http://docs.openstack.org/developer/magnum/troubleshooting-guide.html#heat-stacks to confirm which Heat resource(s) is pending?

Revision history for this message
Carolyn Van Slyck (carolynvs) wrote :

I just had this happen again today. Here is more information about the heat stack. Looks like the minions are stuck?

https://gist.github.com/carolynvs/0f9b6db9b43156cf41fe5a97eee6b871

Revision history for this message
hongbin (hongbin034) wrote :

Yes, it looks like the minion node failed to bootstrap. Could you SSH to the minion node. Then, type the command below and paste the output here:

$ sudo systemctl --full list-units --no-pager
$ sudo journalctl -u cloud-config --no-pager
$ sudo journalctl -u cloud-final --no-pager
$ sudo journalctl -u cloud-init --no-pager
$ sudo cat /var/log/cloud-init-output.log
$ sudo journalctl -u kubelet --no-pager
$ sudo journalctl -u etcd --no-pager
$ sudo journalctl -u docker --no-pager
$ sudo docker ps -a
$ sudo journalctl -u flanneld --no-pager
$ sudo journalctl -u wc-notify --no-pager
$ cat /etc/sysconfig/heat-params

Revision history for this message
Carolyn Van Slyck (carolynvs) wrote :

Okay just had another one become stuck so I tried running the commands above. I think my environment is setup in such a way that none of those commands work, perhaps because I am on Ubuntu?

+ sudo systemctl --full list-units --no-pager
sudo: systemctl: command not found
+ sudo journalctl -u cloud-config --no-pager
sudo: journalctl: command not found
+ sudo journalctl -u cloud-final --no-pager
sudo: journalctl: command not found
+ sudo journalctl -u cloud-init --no-pager
sudo: journalctl: command not found
+ sudo cat /var/log/cloud-init-output.log
cat: /var/log/cloud-init-output.log: No such file or directory
+ sudo journalctl -u kubelet --no-pager
sudo: journalctl: command not found
+ sudo journalctl -u etcd --no-pager
sudo: journalctl: command not found
+ sudo journalctl -u docker --no-pager
sudo: journalctl: command not found
+ docker ps -a
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
+ sudo journalctl -u flanneld --no-pager
sudo: journalctl: command not found
+ sudo journalctl -u wc-notify --no-pager
sudo: journalctl: command not found
+ cat /etc/sysconfig/heat-params
cat: /etc/sysconfig/heat-params: No such file or directory

I ran uname in case that helps identity what commands would work to gather this info:

# uname -a
Linux infra1-utility-container-919baab5 4.2.0-41-generic #48~14.04.1-Ubuntu SMP Fri Jun 24 17:09:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
hongbin (hongbin034) wrote :

It seems you were using a Ubuntu-based image for creating a k8s bay? If yes, are you able to switch the OS. For k8s, the supported OS is fedora-atomic and CoreOS. Below is the link to download the image:

* Atomic: https://fedorapeople.org/groups/magnum/fedora-atomic-latest.qcow2
* CoreOS: http://beta.release.core-os.net/amd64-usr/current/coreos_production_openstack_image.img.bz2

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.