Cluster re-deployment failed, but Fuel returned 'success'

Bug #1358735 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Sharshov

Bug Description

api: '1.0'
astute_sha: 8e1db3926b2320b30b23d7a772122521b0d96166
auth_required: true
build_id: 2014-08-18_11-13-09
build_number: '449'
feature_groups:
- mirantis
fuellib_sha: 2c9ad4aec9f3b6fc060cb5a394733607f07063c1
fuelmain_sha: 08f04775dcfadd8f5b438a31c63e81f29276b7d3
nailgun_sha: bc9e377dbe010732bc2ba47161ed9d433998e07b
ostf_sha: d2a894d228c1f3c22595a77f04b1e00d09d8e463
production: docker
release: '5.1'

Steps to reproduce:

1. Create new cluster Centos, HA, NeutronGre, Ceph for volumes/images/ephemeral. Add 3 controllers+ceph and 1 compute nodes, deploy changes.
2. After successful deployment delete from cluster non-primary controller and add 1 new controller+ceph & 2 computes+ceph from unallocated nodes. Deploy changes.

Expected result:

- old controller removed from cluster, new nodes are successfully added to the cloud

Actual:

- old controller removed and added to unallocated nodes, deployment of new controller failed, new computes are in provisioned state. But fuel reported about success:

"Successfully removed 1 node(s). No errors occurred
Deployment of environment 'TestEnv01' is done. Access the OpenStack dashboard (Horizon) at http://172.16.16.130/"

Here is the output of 'fuel task' command:

[root@fuel-lab-cz5551 test]# fuel task
id | status | name | cluster | progress | uuid
---|---------|---------------|---------|----------|-------------------------------------
14 | ready | node_deletion | 1 | 100 | 2824a9a4-bd48-4386-863b-79445bb06d55
15 | ready | provision | 1 | 100 | dbb6b3aa-77b3-40a6-99d5-2ed55ded3020
16 | ready | deployment | 1 | 100 | b2c074ae-7fb1-434f-a4b0-430c813a10cc
11 | ready | deploy | 1 | 100 | 54384494-b74f-444c-b01a-7d1a2f794717

and 'fuel node':

[root@fuel-lab-cz5551 test]# fuel nodes
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|-------------|------------------|---------|---------------|-------------------|----------------------|---------------|-------
3 | provisioned | Untitled (4a:88) | 1 | 172.16.37.200 | a0:2b:b8:1f:4a:88 | ceph-osd, compute | | True
6 | ready | Untitled (32:cc) | 1 | 172.16.37.197 | 8a:d5:c9:86:d3:48 | ceph-osd, controller | | True
1 | ready | Untitled (4a:ac) | 1 | 172.16.37.195 | e6:6a:ff:f7:37:43 | compute | | True
2 | provisioned | Untitled (48:4c) | 1 | 172.16.37.199 | a0:2b:b8:1f:48:4c | ceph-osd, compute | | True
4 | ready | Untitled (4a:90) | 1 | 172.16.37.201 | 1e:d7:fc:40:3c:45 | ceph-osd, controller | | True
7 | ready | Untitled (2c:d8) | 1 | 172.16.37.198 | da:76:c6:6a:40:46 | ceph-osd, controller | | True
8 | discover | Untitled (16:ec) | None | 172.16.37.234 | a0:d3:c1:ef:16:ec | | | True

As you can see new controller (node-4) is in 'ready' state, but actually its deployment failed and there in no cluster service running on it:

http://paste.openstack.org/show/97285/

Here you can find the part of Astute logs:

http://paste.openstack.org/show/97291/

Unfortunately, I can't attach diagnostic snapshot, because it's too huge (>20Gb) but I can provide an access to the environment.

Changed in fuel:
assignee: nobody → Vladimir Sharshov (vsharshov)
Revision history for this message
Dima Shulyak (dshulyak) wrote :

There is no error status published to nailgun queue.

There is log with error state, but it was not reported

2014-08-19 10:48:48 DEBUG
[449] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"4", "status"=>"deploying", "role"=>"primary-controller", "progress"=>50}]}
2014-08-19 10:48:48 DEBUG
[449] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"4", "status"=>"error", "error_type"=>"deploy", "role"=>"primary-controller"}]}
2014-08-19 10:48:48 DEBUG
[449] Node "4" has failed to deploy. There is no more retries for puppet run.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Here is the link to download diagnostic snapshot:

https://yadi.sk/d/-4M0t2YMa6rjH

Changed in fuel:
status: New → Confirmed
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/115689

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/115689
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=35b68bc55e077d881e7476a11dfad0a018662677
Submitter: Jenkins
Branch: master

commit 35b68bc55e077d881e7476a11dfad0a018662677
Author: Vladimir Sharshov <email address hidden>
Date: Wed Aug 20 19:37:44 2014 +0400

    Send correct node status in case of critical node fail

    Change-Id: I8257ca596db6bfa5f55d836cb1cfeac41ce4979c
    Closes-Bug: #1358735

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.