Comment 5 for bug 1595892

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

So from Puppet logs you can see that "nova-manage db sync" call timed out:

2016-06-24 07:53:28 +0000 /Stage[main]/Nova::Deps/Anchor[nova::config::end] (notice): Triggered 'refresh' from 104 events
2016-06-24 07:58:28 +0000 /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync] (err): Failed to call refresh: Command exceeded timeout
2016-06-24 07:58:28 +0000 /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync] (err): Command exceeded timeout
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:186:in `waitpid2'
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:186:in `execute'

I assume that Puppet forcefully stops the child process in this case. This is confirmed by the snippet Slava provided above:

root@node-8:~# nova-manage db sync
error: (_mysql_exceptions.OperationalError) (1050, "Table 'instances' already exists")

^ the error above can only be seen when a schema migration failed in the middle and you try to run the corresponding migration script again: in normal case it would be skipped (as db sync first checks the current schema version and skips all the prior migration scripts, that have already been applied), but in your case the migration must have failed *right before* the version counter in the DB was updated. Unfortunately, MySQL does not support transactional DDL - all such errors are fatal - you have to manually clean up the database after a failed migration. Puppet should really stop execution of the manifests at this point.

Unfortunately, atop logs start from 10:14, so I can't see *why* the node was so busy that "nova-manage db sync" did not manage to finish in time. Most likely it's an overloaded environment (hdd?).

Anyway, there is nothing nova-specific about this failure. Nova just happen to have one of the largest DB schemas.