Nova crashes with DatabaseNotControlledError

Bug #1645474 reported by Sergey Novikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Anton Chevychalov
Newton
Fix Committed
High
Anton Chevychalov

Bug Description

Detailed bug description:
 the issue was found by https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.plugins.fuel_plugin_example/142/testReport/(root)/deploy_neutron_example_ha/deploy_neutron_example_ha/
Steps to reproduce:
            1. Upload example plugin to the master node
            2. Install plugin
            3. Create cluster
            4. Add 3 node with controller role
            5. Add 1 nodes with compute role
            6. Add 1 nodes with cinder role
            7. Deploy the cluster
            8. Run network verification
            9. check plugin health
            10. Run OSTF
Expected results: all is fine
Actual result: the following tests fail
  - Create volume and boot instance from it
  - Create volume and attach it to instance
  - Check network connectivity from instance via floating IP
  - Launch instance with file injection
  - Launch instance, create snapshot, launch instance from snapshot
with http://paste.openstack.org/show/590712/
in nova-api log: http://paste.openstack.org/show/590713/
from nova-manage log: http://paste.openstack.org/show/590714/

Description of the environment:
snapshot #566

Tags: swarm-fail
Revision history for this message
Sergey Novikov (snovikov) wrote :
Changed in fuel:
importance: Undecided → High
Changed in fuel:
assignee: nobody → MOS Maintenance (mos-maintenance)
Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → nobody
Changed in fuel:
assignee: nobody → Anton Chevychalov (achevychalov)
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

It was something wrong with WSREP during deployment:

http://paste.openstack.org/show/591124/

That cause incomplete migration of nova db

Revision history for this message
Anton Chevychalov (achevychalov) wrote :

That is the real scenario and real cause of that problem:

Facts:

1. We have primary-openstack-controller task.
2. That job is repeatable. If it fail it will be re-triggered.
3. That job triggers Nova installation and configuration
4. Installation and configuration triggers sync-db.
5. sync-db can't repeat step if something goes wrong with connection to DB.

So that is the fail scenario:

1. sync-db was working while wsrep was starting and re-balancing nodes.
2. mysql node on primary controller had failed while sync-db was running
3. sync-db was not able to recover and failed
4. primary-openstack-controller task was failed too.
5. primary-openstack-controller task was triggered one more time.
6. nova installation was not triggered because nova is already installed and configured.
7. sync-db was not triggered too because it depends on nova installation process.
8. primary-openstack-controller task was succeed because there where not fails.
9. so deployment has success state too
10. but nova database had not filled with data and was broken.

Possible solutions:
1. Allways trigger sync-db on primary-openstack-controller.
2. Add check about current db state.
3. Somehow fix wrep and mysql starting procedure.
4. Fix sync-db to repeat steps.

Revision history for this message
Anton Chevychalov (achevychalov) wrote :

That problem already fixed in master https://review.openstack.org/#/c/379217/

Changed in fuel:
status: New → In Progress
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

Fix created for stable/mitaka https://review.openstack.org/#/c/406104

Revision history for this message
Anton Chevychalov (achevychalov) wrote :
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

Fix that was created for bug #1628580 is incomplete.

Revision history for this message
Anton Chevychalov (achevychalov) wrote :
Revision history for this message
Anton Chevychalov (achevychalov) wrote :
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

While original issue has been fixed we intend to care about Keystone patch to Mitaka in separate bug:
https://bugs.launchpad.net/fuel/+bug/1655968

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 9.2 snapshot #771.

Latest runs with fix are passed since #724 snapshot (#171-#179 runs):
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.plugins.fuel_plugin_example/

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-aodh 9.5.0

This issue was fixed in the openstack/puppet-aodh 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-ceilometer 9.5.0

This issue was fixed in the openstack/puppet-ceilometer 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-cinder 9.5.0

This issue was fixed in the openstack/puppet-cinder 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-glance 9.5.0

This issue was fixed in the openstack/puppet-glance 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-heat 9.5.0

This issue was fixed in the openstack/puppet-heat 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-ironic 9.5.0

This issue was fixed in the openstack/puppet-ironic 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-keystone 9.5.0

This issue was fixed in the openstack/puppet-keystone 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-murano 9.5.0

This issue was fixed in the openstack/puppet-murano 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-neutron 9.5.0

This issue was fixed in the openstack/puppet-neutron 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-nova 9.5.0

This issue was fixed in the openstack/puppet-nova 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-sahara 9.5.0

This issue was fixed in the openstack/puppet-sahara 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-barbican 9.6.0

This issue was fixed in the openstack/puppet-barbican 9.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.