Swarmbay gets stuck on creation

Bug #1515997 reported by slotti
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Magnum
Invalid
Undecided
Unassigned

Bug Description

Swarmbay stays in "CREATE_IN_PROGRESS" status on Redhat using Openstack liberty and Magnum master or stable/liberty branch

cloud-init doesn't complete.

$ /var/log/cloud-init-output.log
Cloud-init v. 0.7.5 running 'init-local' at Fri, 13 Nov 2015 12:12:55 +0000. Up 3.74 seconds.
Cloud-init v. 0.7.5 running 'init' at Fri, 13 Nov 2015 12:12:56 +0000. Up 4.47 seconds.
ci-info: +++++++++++++++++++++++++Net device info+++++++++++++++++++++++++
ci-info: +--------+------+-----------+---------------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+-----------+---------------+-------------------+
ci-info: | lo: | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0: | True | 10.0.0.4 | 255.255.255.0 | fa:16:3e:3c:95:69 |
ci-info: +--------+------+-----------+---------------+-------------------+
ci-info: +++++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++++
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.0.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: | 2 | 169.254.169.254 | 10.0.0.2 | 255.255.255.255 | eth0 | UGH |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
Cloud-init v. 0.7.5 running 'modules:config' at Fri, 13 Nov 2015 12:13:00 +0000. Up 8.55 seconds.
Cloud-init v. 0.7.5 running 'modules:final' at Fri, 13 Nov 2015 12:13:01 +0000. Up 9.26 seconds.
removing docker key
Generating RSA private key, 4096 bit long modulus
...................................................................++
..................................++
unable to write 'random state'
e is 65537 (0x10001)
configuring swarm ...
starting services
activating service docker.socket
Created symlink from /etc/systemd/system/sockets.target.wants/docker.socket to /etc/systemd/system/docker.socket.
activating service swarm-agent
Created symlink from /etc/systemd/system/multi-user.target.wants/swarm-agent.service to /etc/systemd/system/swarm-agent.service.
notifying heat
2015-11-13 12:13:03,005 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-012 [7]
2015-11-13 12:13:03,007 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2015-11-13 12:13:03,008 - util.py[WARNING]: Running scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cloud-init v. 0.7.5 finished at Fri, 13 Nov 2015 12:13:03 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 10.98 seconds

[fedora@sw-ca5oxgstk5r-1-dpumws6rjtnr-swarm-node-bdqpisvi4fdk ~]$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[fedora@sw-ca5oxgstk5r-1-dpumws6rjtnr-swarm-node-bdqpisvi4fdk ~]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
docker.io/swarm 0.4.0 556c60f87888 4 weeks ago 10.2 MB

slotti (3-christian-4)
summary: - Swarmbay doesn't stuck on creation
+ Swarmbay gets stuck on creation
Revision history for this message
hongbin (hongbin034) wrote :

Hi Christian,

Does the error consistently reproducible? If yes, could you run the following commands and paste the output here:

$ sudo systemctl list-units

You should see one or multiple services that are failing. Please paste the logs of the failed services. For example

$ sudo journalctl -u cloud-config --no-pager
$ sudo journalctl -u cloud-final --no-pager
$ sudo journalctl -u cloud-init-local --no-pager
$ sudo journalctl -u cloud-init --no-pager
$ sudo journalctl -u etcd --no-pager
$ sudo journalctl -u swarm-agent --no-pager

Revision history for this message
slotti (3-christian-4) wrote :

Here as attachements.

Revision history for this message
slotti (3-christian-4) wrote :

Sorry for the multiple posts.... here in a tar.

Revision history for this message
hongbin (hongbin034) wrote :

I cannot reproduce the error. From the logs, it looks the docker failed to pull images from DockerHub. Could you run the following commands in the master node and paste the output here:

$ curl openstack.org
$ docker --version
$ sudo docker run hello-world
$ cat /etc/systemd/system/swarm-agent.service

The last command print the configuration of swarm agent. Could you re-run the commands there and paste the output as well? For example:

$ sudo /usr/bin/docker kill swarm-agent
$ sudo /usr/bin/docker rm swarm-agent
$ sudo /usr/bin/docker pull swarm:0.4.0
$ sudo /usr/bin/docker run -e http_proxy= -e https_proxy= -e no_proxy= --name swarm-agent swarm:0.4.0 join --addr X.X.X.X:2375 token://XXX

Adrian Otto (aotto)
Changed in magnum:
milestone: none → mitaka-1
Revision history for this message
slotti (3-christian-4) wrote :

Hey,
I did attach the output.
All commands are executed on the swarm-master (49.log) and swarm-node (50.log).

The configuration of the agent tries to start swarm:1.0.0.

Could that be the error?

Revision history for this message
slotti (3-christian-4) wrote :

Ok, I know what was wrong.

openstack-heat-api-cfn was not installed and running on my heat-node.

Sorry for wasting your time.

Eli Qiao (taget-9)
Changed in magnum:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.