[Reduced footprint] Deploy fails with Task[primary-openstack-controller/8]

Bug #1595892 reported by Ksenia Svechnikova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
MOS Nova
9.x
Invalid
High
MOS Nova

Bug Description

Fuel 9.0 376

Detailed bug description: Deploy with compute, virt role fails with <All nodes are finished. Failed tasks: Task[primary-openstack-controller/8] Stopping the deployment process!"}
for report failed: Task status provided '' is not supported; Task name is not provided>

Steps to reproduce:

0) Prepare env HW with 1 node
1) Add FEATURE_GROUPS: ["advanced"] to /etc/nailgun/settings.yaml
2) service nailgun restart
3) Add virt, comute role to the node
4) Add json config to the virt role

[{u'mem': 4, u'id': 1, u'cpu': 2}, {u'mem': 4, u'id': 2, u'cpu': 2}]
[{u'mem': 4, u'id': 3, u'cpu': 2}]

5) Run spawn_vm task
6) Assign controllers to the new KVM nodes (3 nodes)
7) Deploy cluster

Expected results: cluster deployed successfully
Actual result: Deploy fails
[root@lab5-fuel9 ~]# fuel node --env 1
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+---------+-----------------------------+---------+--------------+-------------------+---------------+---------------+--------+---------
 5 | stopped | cz7367-kvm.host-telecom.com | 1 | 172.16.40.76 | 0c:c4:7a:15:11:18 | compute, virt | | 1 | 1
 4 | stopped | cz7366-kvm.host-telecom.com | 1 | 172.16.40.75 | 0c:c4:7a:15:00:e0 | compute, virt | | 1 | 1
 8 | error | Untitled (da:cf) | 1 | 172.16.40.77 | 52:54:00:b1:da:cf | controller | | 1 | 1
10 | stopped | Untitled (ed:e3) | 1 | 172.16.40.79 | 52:54:00:59:ed:e3 | controller | | 1 | 1
 9 | stopped | Untitled (45:8f) | 1 | 172.16.40.78 | 52:54:00:b5:45:8f | controller | | 1 | 1
[root@lab5-fuel9 ~]#

From node-8 puppet: http://paste.openstack.org/show/521881/

node-8, nova-api: http://paste.openstack.org/show/521882/

Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

HW lab with this issue is alive, please contact me for access

Revision history for this message
Ksenia Svechnikova (kdemina) wrote :
tags: added: regression
Revision history for this message
slava valyavskiy (slava-val-al) wrote :
Revision history for this message
Dina Belova (dbelova) wrote :

Moving to 9.0-updates due to the high priority and due to the fact the issue is observed for specific feature.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

So from Puppet logs you can see that "nova-manage db sync" call timed out:

2016-06-24 07:53:28 +0000 /Stage[main]/Nova::Deps/Anchor[nova::config::end] (notice): Triggered 'refresh' from 104 events
2016-06-24 07:58:28 +0000 /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync] (err): Failed to call refresh: Command exceeded timeout
2016-06-24 07:58:28 +0000 /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync] (err): Command exceeded timeout
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:186:in `waitpid2'
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:186:in `execute'

I assume that Puppet forcefully stops the child process in this case. This is confirmed by the snippet Slava provided above:

root@node-8:~# nova-manage db sync
error: (_mysql_exceptions.OperationalError) (1050, "Table 'instances' already exists")

^ the error above can only be seen when a schema migration failed in the middle and you try to run the corresponding migration script again: in normal case it would be skipped (as db sync first checks the current schema version and skips all the prior migration scripts, that have already been applied), but in your case the migration must have failed *right before* the version counter in the DB was updated. Unfortunately, MySQL does not support transactional DDL - all such errors are fatal - you have to manually clean up the database after a failed migration. Puppet should really stop execution of the manifests at this point.

Unfortunately, atop logs start from 10:14, so I can't see *why* the node was so busy that "nova-manage db sync" did not manage to finish in time. Most likely it's an overloaded environment (hdd?).

Anyway, there is nothing nova-specific about this failure. Nova just happen to have one of the largest DB schemas.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.