swarm bay is "CREATE_COMPLETE" but the bay is not usable acutally

Bug #1500291 reported by Eli Qiao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Magnum
Fix Released
High
Eli Qiao

Bug Description

I created a bay with atomic-5 image with coe=swarm

+--------------------+------------------------------------------+
| Property | Value |
+--------------------+------------------------------------------+
| status | CREATE_COMPLETE |
| uuid | 0c7fea29-cb66-4d6a-bcd7-698f7e6243b9 |
| status_reason | Stack CREATE completed successfully |
| created_at | 2015-09-28T02:34:54+00:00 |
| updated_at | 2015-09-28T02:37:11+00:00 |
| bay_create_timeout | 0 |
| api_address | 172.24.4.206 |
| baymodel_id | 57614820-1871-45a6-af3a-024aac27c298 |
| node_count | 1 |
| node_addresses | [u'172.24.4.207'] |
| master_count | 1 |
| discovery_url | token://c0717094a987deeb88d6e7b5aba6d14a |
| name | swarmbay |

after some time I see it is 'CREATE_COMPLETE' state, but when I login into the bay's vms
swarm-master, and swarm-node failed to run..
[fedora@swarmbay-pn4pjq4gg4zw-swarm-master-bfx7doenlits ~]$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6aaadf3a26d4 swarm:0.2.0 "/swarm manage -H tc 42 minutes ago Exited (137) 42 minutes ago swarm-manager
417356a72d82 swarm:0.2.0 "/swarm join --addr 53 minutes ago Exited (2) 53 minutes ago swarm-agent

we need some validation before the bay is usable.

Eli Qiao (taget-9)
Changed in magnum:
status: New → Confirmed
Revision history for this message
hongbin (hongbin034) wrote :

Basically, we need to fix the swarm template to make it work with the new image. This is a high priority bug.

Changed in magnum:
importance: Undecided → High
Revision history for this message
Ton Ngo (ton-i) wrote :

@hongbin: I think you mean this bug, which documents the specific problem with "docker run" in swarm cluster:

https://bugs.launchpad.net/magnum/+bug/1499607?

The bug 1500291 here is intended for the more general scenario, where the cluster may complete as far as Heat is concerned, but the services in the cluster are not running correctly. This can happen with k8s or swarm cluster. Basically we need to add some more validation code to check on the health of the services in the cluster.

Revision history for this message
Egor Guz (eghobo) wrote :

@hongbin: I am not sure it's Swarm template issue. I think something wrong with image (or we configure it wrong), because Kub instances has the same symptoms .

Eli Qiao (taget-9)
Changed in magnum:
assignee: nobody → Eli Qiao (taget-9)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/228762

Changed in magnum:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/228762
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=832aa604363aebef178129a84d05a6b71de48cae
Submitter: Jenkins
Branch: master

commit 832aa604363aebef178129a84d05a6b71de48cae
Author: Eli Qiao <email address hidden>
Date: Tue Sep 29 14:58:04 2015 +0800

    Swarm: Set to CREATE_FAILED status if swarm services not started

    This patch checks $NODE_SERVICES status before send SUCCESS single to heat cfn,
    this will make sure the bay is usable.

    Change-Id: Ie232c578c5c27b1842965bdda481096fb0b5c820
    Closes-Bug: #1500291

Changed in magnum:
status: In Progress → Fix Committed
Adrian Otto (aotto)
Changed in magnum:
milestone: none → mitaka-2
status: Fix Committed → Fix Released
Adrian Otto (aotto)
Changed in magnum:
milestone: mitaka-2 → mitaka-1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.