Failed to start any docker instance on atomic-5 image

Bug #1499607 reported by Eli Qiao
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Magnum
Incomplete
Undecided
Unassigned

Bug Description

Use atomic-5 image to start a swarm bay.
can not start swarm-manager and swam-node service
http://paste.openstack.org/show/473987/

Revision history for this message
Eli Qiao (taget-9) wrote :

[fedora@swarmbay5-dxjbycmlbxig-swarm-master-ehfcg6hpsq6q ~]$ sudo docker version
Client version: 1.7.1.fc21
Client API version: 1.19
Package Version (client): docker-io-1.7.1-2.git33de319.fc21.x86_64
Go version (client): go1.4.2
Git commit (client): 33de319/1.7.1
OS/Arch (client): linux/amd64
Server version: 1.7.1.fc21
Server API version: 1.19
Package Version (server): docker-io-1.7.1-2.git33de319.fc21.x86_64
Go version (server): go1.4.2
Git commit (server): 33de319/1.7.1
OS/Arch (server): linux/amd64

Google found that others have same issue with docker 1.7.1
https://forums.docker.com/t/cannot-manage-containers-are-you-trying-to-connect-to-a-tls-enabled-daemon-without-tls/2144

Changed in magnum:
assignee: nobody → Eli Qiao (taget-9)
Revision history for this message
Steve Adams (sa240s) wrote :
Download full text (3.9 KiB)

I can make swarm master and two swarm nodes "ACTIVE", and ssh into them using ATOMIC-5. (See below).
My "container-create" fails with a docker internal error.
I'm running docker 1.8.2 on host and swarm bays running 1.7.1

Is this bug about starting containers using those swarm bays?

----------------------- example output...

stack@Steve-Magnum-AllInOne:~/devstack$ nova list
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+-----------------------------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+-----------------------------------------------------------------------+
| 061562d9-ece2-470e-b121-cf5ac3b39e54 | sw-mba4vgeydz7-0-g66yrvsjskel-swarm_node-xfmgaj7epnzk | ACTIVE | - | Running | swarmbay-jf2m3wlc2jem-fixed_network-thlrqkjmi4yz=10.0.0.5, 172.24.4.6 |
| 1c1ab216-2197-47bf-b31f-9ab9e5231f93 | sw-mba4vgeydz7-1-ekfk2etlmlei-swarm_node-ojq65eq6avso | ACTIVE | - | Running | swarmbay-jf2m3wlc2jem-fixed_network-thlrqkjmi4yz=10.0.0.4, 172.24.4.5 |
| 55b3e0c0-5f62-4075-9309-0759f75e21f9 | swarmbay-jf2m3wlc2jem-swarm_master-oikqqxyccjo2 | ACTIVE | - | Running | swarmbay-jf2m3wlc2jem-fixed_network-thlrqkjmi4yz=10.0.0.3, 172.24.4.4 |
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+-----------------------------------------------------------------------+
stack@Steve-Magnum-AllInOne:~/devstack$

devstack$ magnum container-create --name test-container \
> --image cirros \
> --bay swarmbay \
> --command "ping -c 4 8.8.8.8"
ERROR: Docker internal Error: ('Connection aborted.', error(111, 'ECONNREFUSED')) (HTTP 500)
stack@Steve-Magnum-AllInOne:~/devstack$ docker version
Client:
 Version: 1.8.2
 API version: 1.20
 Go version: go1.4.2
 Git commit: 0a8c2e3
 Built: Thu Sep 10 19:19:00 UTC 2015
 OS/Arch: linux/amd64

Server:
 Version: 1.8.2
 API version: 1.20
 Go version: go1.4.2
 Git commit: 0a8c2e3
 Built: Thu Sep 10 19:19:00 UTC 2015
 OS/Arch: linux/amd64

stack@Steve-Magnum-AllInOne:~/devstack$ ssh fedora@172.24.4.4
The authenticity of host '172.24.4.4 (172.24.4.4)' can't be established.
ECDSA key fingerprint is 5e:10:c4:69:db:bb:fe:ab:5b:8d:04:ac:b3:39:58:e2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.24.4.4' (ECDSA) to the list of known hosts.
[fedora@swarmbay-jf2m3wlc2jem-swarm-master-oikqqxyccjo2 ~]$
[fedora@swarmbay-jf2m3wlc2jem-swarm-master-oikqqxyccjo2 ~]$
[fedora@swarmbay-jf2m3wlc2jem-swarm-master-oikqqxyccjo2 ~]$
[fedora@swarmbay-jf2m3wlc2jem-swarm-master-oikqqxyccjo2 ~]$
[fedora@swarmbay-jf2m3wlc2jem-swarm-master-oikqqxyccjo2 ~]$ docker ps
Get h...

Read more...

Revision history for this message
Egor Guz (eghobo) wrote :

@Eli, I tested kub with atomic-5 and it can start docker instances without any problems and I can attach for them.

But busybox cannot start correctly
docker run -it busybox
Post http:///var/run/docker.sock/v1.19/containers/b350ac5a0823a69207792d813e157c2b1b56d608eebefcfc0ea5995c6146e0c4/start: EOF. Are you trying to connect to a TLS-enabled daemon without TLS?

But instance is actually running
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b350ac5a0823 busybox "sh" 8 seconds ago Up 7 seconds insane_rosalind

And I can attach to it
docker exec -i -t b350ac5a0823 /bin/sh

Revision history for this message
Eli Qiao (taget-9) wrote :

hi steven and Egor,

are you using a baymodel with attr insecure=False
| insecure | False |

yes, I can create a bay 'CREATE_COMPLETE', but swarm-master failed to start

Sep 28 02:48:36 swarmbay-pn4pjq4gg4zw-swarm-master-bfx7doenlits.novalocal systemd[1]: swarm-manager.service: main process exited, code=exited, status=1/FAILURE
Sep 28 02:48:36 swarmbay-pn4pjq4gg4zw-swarm-master-bfx7doenlits.novalocal docker[1419]: Post http:///var/run/docker.sock/v1.19/containers/6aaadf3a26d41e1754da4acad654b6226f2b651718f8ad9ef5d985c475d7bbd2/start: EOF. Are you trying to connect to a TLS-enabled daemon without TLS?

I guess we start docker daemon with tls mode??

[fedora@swarmbay-pn4pjq4gg4zw-swarm-master-bfx7doenlits ~]$ ps aux | grep docker
root 822 0.1 1.3 686612 27124 ? Ssl 02:36 0:01 /usr/bin/docker -d -H fd:// -H tcp://0.0.0.0:2375 --tls --tlsverify --tlscacert="/etc/docker/ca.crt" --tlskey="/etc/docker/server.key" --tlscert="/etc/docker/server.crt" --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.datadev=/dev/mapper/atomicos-docker--data --storage-opt dm.metadatadev=/dev/mapper/atomicos-docker--meta

see this changes : https://review.openstack.org/#/c/212598/

Revision history for this message
Eli Qiao (taget-9) wrote :

found that this is a bug of docker 1.7.1, see below bug report :

https://bugzilla.redhat.com/show_bug.cgi?id=1244124

there were already new packages with fixes.

I found there is a possible workaround.

if first time run systemctl start swarm-agent.service failed to start that service,
we can find the container already created by `docker ps -a`
then try another `docker start $container_id`, even it gives error message again
::
Post http:///var/run/docker.sock/v1.19/containers/54d4b47fbd03c3f77340b06e7deb50830f531a1186ba12da9527f3504d14d434/start: EOF. Are you trying to connect to a TLS-enabled daemon without TLS?

the container is running actually.

Eli Qiao (taget-9)
Changed in magnum:
assignee: Eli Qiao (taget-9) → nobody
Revision history for this message
Eli Qiao (taget-9) wrote :

this is a testing with tls enable cases with atomic-5 image, http://paste.openstack.org/show/474775/

Revision history for this message
Eli Qiao (taget-9) wrote :

(14时51分16秒) Tango: eliqiao: We discussed the docker run problem with atomic-5 on the IRC meeting this week
(14时51分29秒) Tango: eliqiao: It's important to fix this
(14时51分55秒) Tango: eliqiao: apmelton confirmed that the build from kojipkgs for docker 1.7.1 is bad
(14时52分45秒) Tango: eliqiao: We agreed to try out docker 1.8.1. I have built the new image and am testing it now

Changed in magnum:
status: New → Confirmed
Revision history for this message
Eli Qiao (taget-9) wrote :

Update from Ton's mail

I rebuilt the 2 images and uploaded to
https://fedorapeople.org/groups/magnum/

Here are the md5sum for Eli:
fedora-21-atomic-5-d181.qcow2 cebefc0c21fb8567e662bf9f2d5b78b0
fedora-21-atomic-6-d181.qcow2 f952a15064c6366fde6af8b3a452c699

Egor, Ton and me are working on new image testing.

-Eli

Revision history for this message
Ma Wen Cheng (mars914) wrote :

I had a try on the new images fedora-21-atomic-5-d181.qcow2 and fedora-21-atomic-6-d181.qcow2 , failed to create swarm bay with the error:
Resource CREATE failed: WaitConditionFailure: resources.swarm_nodes.resources[0].resources.node_agent_wait_condition: swarm-agent service failed to start.

looks that the images have problem inside.

Revision history for this message
Egor Guz (eghobo) wrote :

@Ma Wen Cheng: could you login to master and agent see what is the actual error? also keep in mind that swarm need access to internet to pull swarm docker image from docker.io.

Revision history for this message
Eli Qiao (taget-9) wrote :

+1 for Egor's comments, I tested on my local environment, it works fine, please be note, the new image should only work for swarm 0.4.0 image.

Revision history for this message
Ma Wen Cheng (mars914) wrote :

@Egor Guz, yes, I login into the swarm master node to check the status of services, the docker, swarm-agent and swarm-manager all failed.
[fedora@swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu ~]$ sudo journalctl -xe
Oct 15 03:35:30 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal kernel: device-mapper: ioctl: unable to remove open device docker-253:0-270518-base
Oct 15 03:35:34 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal docker[1497]: time="2015-10-15T03:35:34.763071183Z" level=fatal msg="Error starting daemon: error initializing graphdriver: De
Oct 15 03:35:34 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 15 03:35:34 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is failed.
Oct 15 03:35:34 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal systemd[1]: Dependency failed for Swarm Manager.
-- Subject: Unit swarm-manager.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit swarm-manager.service has failed.
--
-- The result is dependency.
Oct 15 03:35:34 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal systemd[1]: Triggering OnFailure= dependencies of swarm-manager.service.
Oct 15 03:35:35 swarmbay-d181-ogkqhbzvj3kk-swarm-master-n7j4hxrnppgu.novalocal systemd[1]: Dependency failed for Swarm Agent.
-- Subject: Unit swarm-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit swarm-agent.service has failed.
--
-- The result is dependency.

I tried fedora-21-atomic-5 image, it really works, but not fedora-21-atomic-5-d181 .

Revision history for this message
Egor Guz (eghobo) wrote :

@Ma Wen Cheng: 'swarm:0.2.0' doesn't work with fedora-21-atomic-5-d18/Docker 1.8.1, please pickup latest master we already upgraded to 'swarm:0.4.0'. just fyi Ton reamed fedora-21-atomic-5-d18 to fedora-21-atomic-5 at Fedora repo yesterday.

Revision history for this message
Ma Wen Cheng (mars914) wrote :

@Egor Guz, I made a verify again, but problem still existed :
[fedora@swarmbay-d181-oul2fzjdpx6t-swarm-master-2aft4uylm6js ~]$ sudo systemctl status swarm-agent
● swarm-agent.service - Swarm Agent
   Loaded: loaded (/etc/systemd/system/swarm-agent.service; enabled)
   Active: inactive (dead)

Oct 15 10:04:59 swarmbay-d181-oul2fzjdpx6t-swarm-master-2aft4uylm6js.novalocal systemd[1]: Dependency failed for Swarm Agent.
Oct 15 10:04:59 swarmbay-d181-oul2fzjdpx6t-swarm-master-2aft4uylm6js.novalocal systemd[1]: Triggering OnFailure= dependencies of swarm-agent.service.
[fedora@swarmbay-d181-oul2fzjdpx6t-swarm-master-2aft4uylm6js ~]$ cat /etc/systemd/system/swarm-agent.service
[Unit]
Description=Swarm Agent
After=docker.service
Requires=docker.service
OnFailure=swarm-agent-failure.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill swarm-agent
ExecStartPre=-/usr/bin/docker rm swarm-agent
ExecStartPre=-/usr/bin/docker pull swarm:0.4.0
ExecStart=/usr/bin/docker run -e http_proxy= -e https_proxy= -e no_proxy= --name swarm-agent swarm:0.4.0 join --addr 10.0.0.3:2375 token://d178a1c9fad231df9c732f498dfd03ca
ExecStop=/usr/bin/docker stop swarm-agent
ExecStartPost=/usr/bin/curl -sf -X PUT -H 'Content-Type: application/json' \
  --data-binary '{"Status": "SUCCESS", "Reason": "Setup complete", "Data": "OK", "UniqueId": "00000"}' \
  "http://9.5.124.160:8000/v1/waitcondition/arn%3Aopenstack%3Aheat%3A%3A8c4613a4336b4ba99c0490686327064e%3Astacks%2Fswarmbay-d181-oul2fzjdpx6t%2Fc143d5be-ae62-44f9-b02c-f7c5b24bf800%2Fresources%2Fagent_wait_handle?Timestamp=2015-10-15T09%3A54%3A22Z&SignatureMethod=HmacSHA256&AWSAccessKeyId=8585df8b87c445339949e0fda216136b&SignatureVersion=2&Signature=537EKeYpZUkxhYvleKQ56mpRffWUei4BLcnI61jj4Ck%3D"

[Install]
WantedBy=multi-user.target
[fedora@swarmbay-d181-oul2fzjdpx6t-swarm-master-2aft4uylm6js ~]$ docker -v
Docker version 1.8.1.fc21, build 32b8b25/1.8.1

Revision history for this message
Eli Qiao (taget-9) wrote :

I tested, it works for me, could you tell the output of:

sudo systemctl status swarm-master.service -l

sudo systemctl status swarm-agent.service -l

Adrian Otto (aotto)
Changed in magnum:
milestone: none → mitaka-1
Revision history for this message
Eli Qiao (taget-9) wrote :

I think this issue is fixed by updating atomic image.

Changed in magnum:
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.