Mirantis OpenStack

[Reduced footprint] Deploy fails with Task[primary-openstack-controller/8]

Bug #1595892 reported by Ksenia Svechnikova on 2016-06-24

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Status tracked in 10.0.x
	10.0.x	Invalid	High	MOS Nova	Mirantis OpenStack 10.0
	9.x	Invalid	High	MOS Nova	Mirantis OpenStack 9.1

Bug Description

Fuel 9.0 376

Detailed bug description: Deploy with compute, virt role fails with <All nodes are finished. Failed tasks: Task[primary-openstack-controller/8] Stopping the deployment process!"}
for report failed: Task status provided '' is not supported; Task name is not provided>

Steps to reproduce:

0) Prepare env HW with 1 node
1) Add FEATURE_GROUPS: ["advanced"] to /etc/nailgun/settings.yaml
2) service nailgun restart
3) Add virt, comute role to the node
4) Add json config to the virt role

[{u'mem': 4, u'id': 1, u'cpu': 2}, {u'mem': 4, u'id': 2, u'cpu': 2}]
[{u'mem': 4, u'id': 3, u'cpu': 2}]

5) Run spawn_vm task
6) Assign controllers to the new KVM nodes (3 nodes)
7) Deploy cluster

Expected results: cluster deployed successfully
Actual result: Deploy fails
[root@lab5-fuel9 ~]# fuel node --env 1
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+---------+-----------------------------+---------+--------------+-------------------+---------------+---------------+--------+---------
5 | stopped | cz7367-kvm.host-telecom.com | 1 | 172.16.40.76 | 0c:c4:7a:15:11:18 | compute, virt | | 1 | 1
4 | stopped | cz7366-kvm.host-telecom.com | 1 | 172.16.40.75 | 0c:c4:7a:15:00:e0 | compute, virt | | 1 | 1
8 | error | Untitled (da:cf) | 1 | 172.16.40.77 | 52:54:00:b1:da:cf | controller | | 1 | 1
10 | stopped | Untitled (ed:e3) | 1 | 172.16.40.79 | 52:54:00:59:ed:e3 | controller | | 1 | 1
9 | stopped | Untitled (45:8f) | 1 | 172.16.40.78 | 52:54:00:b5:45:8f | controller | | 1 | 1
[root@lab5-fuel9 ~]#

From node-8 puppet: http://paste.openstack.org/show/521881/

node-8, nova-api: http://paste.openstack.org/show/521882/

Tags:

Revision history for this message

Ksenia Svechnikova (kdemina) wrote on 2016-06-24:

#1

HW lab with this issue is alive, please contact me for access

Revision history for this message

Ksenia Svechnikova (kdemina) wrote on 2016-06-24:

#2

Snapshot: https://drive.google.com/file/d/0B2v38w72jlwTQVdsNmtlRXFtY28/view?usp=sharing

Ksenia Svechnikova (kdemina) on 2016-06-24

tags:

added: regression

Revision history for this message

slava valyavskiy (slava-val-al) wrote on 2016-06-24:

#3

http://pastebin.com/emxmqVza

Revision history for this message

Dina Belova (dbelova) wrote on 2016-06-24:

#4

Moving to 9.0-updates due to the high priority and due to the fact the issue is observed for specific feature.

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-07-05:

#5

So from Puppet logs you can see that "nova-manage db sync" call timed out:

2016-06-24 07:53:28 +0000 /Stage[main]/Nova::Deps/Anchor[nova::config::end] (notice): Triggered 'refresh' from 104 events
2016-06-24 07:58:28 +0000 /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync] (err): Failed to call refresh: Command exceeded timeout
2016-06-24 07:58:28 +0000 /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync] (err): Command exceeded timeout
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:186:in `waitpid2'
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:186:in `execute'

I assume that Puppet forcefully stops the child process in this case. This is confirmed by the snippet Slava provided above:

root@node-8:~# nova-manage db sync
error: (_mysql_exceptions.OperationalError) (1050, "Table 'instances' already exists")

^ the error above can only be seen when a schema migration failed in the middle and you try to run the corresponding migration script again: in normal case it would be skipped (as db sync first checks the current schema version and skips all the prior migration scripts, that have already been applied), but in your case the migration must have failed *right before* the version counter in the DB was updated. Unfortunately, MySQL does not support transactional DDL - all such errors are fatal - you have to manually clean up the database after a failed migration. Puppet should really stop execution of the manifests at this point.

Unfortunately, atop logs start from 10:14, so I can't see *why* the node was so busy that "nova-manage db sync" did not manage to finish in time. Most likely it's an overloaded environment (hdd?).

Anyway, there is nothing nova-specific about this failure. Nova just happen to have one of the largest DB schemas.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.