Fuel for OpenStack

Cluster re-deployment failed, but Fuel returned 'success'

Bug #1358735 reported by Artem Panchenko on 2014-08-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Committed	High	Vladimir Sharshov	Fuel for OpenStack 5.1

Bug Description

api: '1.0'
astute_sha: 8e1db3926b2320b30b23d7a772122521b0d96166
auth_required: true
build_id: 2014-08-18_11-13-09
build_number: '449'
feature_groups:
- mirantis
fuellib_sha: 2c9ad4aec9f3b6fc060cb5a394733607f07063c1
fuelmain_sha: 08f04775dcfadd8f5b438a31c63e81f29276b7d3
nailgun_sha: bc9e377dbe010732bc2ba47161ed9d433998e07b
ostf_sha: d2a894d228c1f3c22595a77f04b1e00d09d8e463
production: docker
release: '5.1'

Steps to reproduce:

1. Create new cluster Centos, HA, NeutronGre, Ceph for volumes/images/ephemeral. Add 3 controllers+ceph and 1 compute nodes, deploy changes.
2. After successful deployment delete from cluster non-primary controller and add 1 new controller+ceph & 2 computes+ceph from unallocated nodes. Deploy changes.

Expected result:

- old controller removed from cluster, new nodes are successfully added to the cloud

Actual:

- old controller removed and added to unallocated nodes, deployment of new controller failed, new computes are in provisioned state. But fuel reported about success:

"Successfully removed 1 node(s). No errors occurred
Deployment of environment 'TestEnv01' is done. Access the OpenStack dashboard (Horizon) at http://172.16.16.130/"

Here is the output of 'fuel task' command:

[root@fuel-lab-cz5551 test]# fuel task
id | status | name | cluster | progress | uuid
---|---------|---------------|---------|----------|-------------------------------------
14 | ready | node_deletion | 1 | 100 | 2824a9a4-bd48-4386-863b-79445bb06d55
15 | ready | provision | 1 | 100 | dbb6b3aa-77b3-40a6-99d5-2ed55ded3020
16 | ready | deployment | 1 | 100 | b2c074ae-7fb1-434f-a4b0-430c813a10cc
11 | ready | deploy | 1 | 100 | 54384494-b74f-444c-b01a-7d1a2f794717

and 'fuel node':

[root@fuel-lab-cz5551 test]# fuel nodes
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|-------------|------------------|---------|---------------|-------------------|----------------------|---------------|-------
3 | provisioned | Untitled (4a:88) | 1 | 172.16.37.200 | a0:2b:b8:1f:4a:88 | ceph-osd, compute | | True
6 | ready | Untitled (32:cc) | 1 | 172.16.37.197 | 8a:d5:c9:86:d3:48 | ceph-osd, controller | | True
1 | ready | Untitled (4a:ac) | 1 | 172.16.37.195 | e6:6a:ff:f7:37:43 | compute | | True
2 | provisioned | Untitled (48:4c) | 1 | 172.16.37.199 | a0:2b:b8:1f:48:4c | ceph-osd, compute | | True
4 | ready | Untitled (4a:90) | 1 | 172.16.37.201 | 1e:d7:fc:40:3c:45 | ceph-osd, controller | | True
7 | ready | Untitled (2c:d8) | 1 | 172.16.37.198 | da:76:c6:6a:40:46 | ceph-osd, controller | | True
8 | discover | Untitled (16:ec) | None | 172.16.37.234 | a0:d3:c1:ef:16:ec | | | True

As you can see new controller (node-4) is in 'ready' state, but actually its deployment failed and there in no cluster service running on it:

http://paste.openstack.org/show/97285/

Here you can find the part of Astute logs:

http://paste.openstack.org/show/97291/

Unfortunately, I can't attach diagnostic snapshot, because it's too huge (>20Gb) but I can provide an access to the environment.

Artem Panchenko (apanchenko-8) on 2014-08-19

Changed in fuel:
assignee:	nobody → Vladimir Sharshov (vsharshov)

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2014-08-19:

There is no error status published to nailgun queue.

There is log with error state, but it was not reported

2014-08-19 10:48:48 DEBUG
[449] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"4", "status"=>"deploying", "role"=>"primary-controller", "progress"=>50}]}
2014-08-19 10:48:48 DEBUG
[449] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"4", "status"=>"error", "error_type"=>"deploy", "role"=>"primary-controller"}]}
2014-08-19 10:48:48 DEBUG
[449] Node "4" has failed to deploy. There is no more retries for puppet run.

Revision history for this message

Artem Panchenko (apanchenko-8) wrote on 2014-08-20:

Here is the link to download diagnostic snapshot:

https://yadi.sk/d/-4M0t2YMa6rjH

Vladimir Kuklin (vkuklin) on 2014-08-20

Changed in fuel:
status:	New → Confirmed

Vladimir Sharshov (vsharshov) on 2014-08-20

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-08-20: Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/115689

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-08-21: Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/115689
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=35b68bc55e077d881e7476a11dfad0a018662677
Submitter: Jenkins
Branch: master

commit 35b68bc55e077d881e7476a11dfad0a018662677
Author: Vladimir Sharshov <email address hidden>
Date: Wed Aug 20 19:37:44 2014 +0400

Send correct node status in case of critical node fail

Change-Id: I8257ca596db6bfa5f55d836cb1cfeac41ce4979c
Closes-Bug: #1358735

Changed in fuel:
status:	In Progress → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.