Cluster re-deployment failed, but Fuel returned 'success'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Vladimir Sharshov |
Bug Description
api: '1.0'
astute_sha: 8e1db3926b2320b
auth_required: true
build_id: 2014-08-18_11-13-09
build_number: '449'
feature_groups:
- mirantis
fuellib_sha: 2c9ad4aec9f3b6f
fuelmain_sha: 08f04775dcfadd8
nailgun_sha: bc9e377dbe01073
ostf_sha: d2a894d228c1f3c
production: docker
release: '5.1'
Steps to reproduce:
1. Create new cluster Centos, HA, NeutronGre, Ceph for volumes/
2. After successful deployment delete from cluster non-primary controller and add 1 new controller+ceph & 2 computes+ceph from unallocated nodes. Deploy changes.
Expected result:
- old controller removed from cluster, new nodes are successfully added to the cloud
Actual:
- old controller removed and added to unallocated nodes, deployment of new controller failed, new computes are in provisioned state. But fuel reported about success:
"Successfully removed 1 node(s). No errors occurred
Deployment of environment 'TestEnv01' is done. Access the OpenStack dashboard (Horizon) at http://
Here is the output of 'fuel task' command:
[root@fuel-
id | status | name | cluster | progress | uuid
---|---
14 | ready | node_deletion | 1 | 100 | 2824a9a4-
15 | ready | provision | 1 | 100 | dbb6b3aa-
16 | ready | deployment | 1 | 100 | b2c074ae-
11 | ready | deploy | 1 | 100 | 54384494-
and 'fuel node':
[root@fuel-
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|---
3 | provisioned | Untitled (4a:88) | 1 | 172.16.37.200 | a0:2b:b8:1f:4a:88 | ceph-osd, compute | | True
6 | ready | Untitled (32:cc) | 1 | 172.16.37.197 | 8a:d5:c9:86:d3:48 | ceph-osd, controller | | True
1 | ready | Untitled (4a:ac) | 1 | 172.16.37.195 | e6:6a:ff:f7:37:43 | compute | | True
2 | provisioned | Untitled (48:4c) | 1 | 172.16.37.199 | a0:2b:b8:1f:48:4c | ceph-osd, compute | | True
4 | ready | Untitled (4a:90) | 1 | 172.16.37.201 | 1e:d7:fc:40:3c:45 | ceph-osd, controller | | True
7 | ready | Untitled (2c:d8) | 1 | 172.16.37.198 | da:76:c6:6a:40:46 | ceph-osd, controller | | True
8 | discover | Untitled (16:ec) | None | 172.16.37.234 | a0:d3:c1:ef:16:ec | | | True
As you can see new controller (node-4) is in 'ready' state, but actually its deployment failed and there in no cluster service running on it:
http://
Here you can find the part of Astute logs:
http://
Unfortunately, I can't attach diagnostic snapshot, because it's too huge (>20Gb) but I can provide an access to the environment.
Changed in fuel: | |
assignee: | nobody → Vladimir Sharshov (vsharshov) |
Changed in fuel: | |
status: | New → Confirmed |
Changed in fuel: | |
status: | Confirmed → In Progress |
There is no error status published to nailgun queue.
There is log with error state, but it was not reported
2014-08-19 10:48:48 DEBUG Reporter to report it up: {"nodes" =>[{"uid" =>"4", "status" =>"deploying" , "role"= >"primary- controller" , "progress"=>50}]} Reporter to report it up: {"nodes" =>[{"uid" =>"4", "status"=>"error", "error_ type"=> "deploy" , "role"= >"primary- controller" }]}
[449] Data send by DeploymentProxy
2014-08-19 10:48:48 DEBUG
[449] Data received by DeploymentProxy
2014-08-19 10:48:48 DEBUG
[449] Node "4" has failed to deploy. There is no more retries for puppet run.