Fuel for OpenStack

If keystone-manage db_sync fails first time, deployment cannot proceed

Bug #1592819 reported by Alexandr Kostrikov on 2016-06-15

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Committed	High	Alex Schultz	Fuel for OpenStack 10.0
	Mitaka	Fix Released	High	Alex Schultz	Fuel for OpenStack 9.1

Bug Description

During SWARM run[0] there were an error in separate all services scenario:
            1. Create cluster
            2. Add 3 nodes with controller role
            3. Add 3 nodes with database, keystone, rabbit,
               horizon
            4. Add 1 compute and cinder
            5. Verify networks
            6. Deploy the cluster
            7. Verify networks
            8. Run OSTF

The deployment had failed with error:
AssertionError: Cluster is not deployed: some nodes are in the Error state

On the detached database/keystone/rabbit there were an error which stated that `keystone.project` table does not exit:
http://paste.openstack.org/show/516241/

Puppet logs show that only db_sync has been executed and no bootstrap:
http://paste.openstack.org/show/516242/

There is no calls of bootstrap in manifest:
http://paste.openstack.org/show/516249/

[0] https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.plugins.thread_2_separate_services/143/testReport/(root)/separate_all_service/separate_all_service/

Tags:

Revision history for this message

Alexandr Kostrikov (akostrikov-mirantis) wrote on 2016-06-15:

fail_error_separate_all_service-fuel-snapshot-2016-06-14_23-55-32.tar.gz Edit (23.3 MiB, application/x-tar)

Alexandr Kostrikov (akostrikov-mirantis) on 2016-06-15

tags:

added: swarm-blocker

Revision history for this message

Bug Checker Bot (bug-checker) wrote on 2016-06-15: Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags:

added: need-info

Revision history for this message

Alexandr Kostrikov (akostrikov-mirantis) wrote on 2016-06-15: Re: No `keystone-manage bootstrap` in openstack_tasks/manifests/keystone/keystone.pp in detached services deployment

Setting it to swarm blocker due to fact that it has blocked two more tests.

Changed in fuel:
assignee:	Fuel Toolbox (fuel-toolbox) → Matthew Mosesohn (raytrac3r)

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2016-06-15:

The root cause is that Galera wasn't ready when keystone-manage db_sync ran. We can try to research how to trigger this across puppet applies, but right now it is refreshonly and can't be retriggered easily.

summary:

- No `keystone-manage bootstrap` in
- openstack_tasks/manifests/keystone/keystone.pp in detached services
- deployment
+ If keystone-manage db_sync fails first time, deployment cannot proceed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-15: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/330025

Changed in fuel:
status:	New → In Progress

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2016-06-15:

I guess this could just be a duplicate of Bug 1592401 since it's probably the same underlying problem. The proposed fix is one workaround but doesn't necessarily address the core issue

Maksim Malchuk (mmalchuk) on 2016-06-15

no longer affects:

fuel/newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-16: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/330025
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=421bf72dc2e65fa2afcb26c1234e44b59e127ccc
Submitter: Jenkins
Branch: master

commit 421bf72dc2e65fa2afcb26c1234e44b59e127ccc
Author: Matthew Mosesohn <email address hidden>
Date: Wed Jun 15 18:47:01 2016 +0300

Add retries for keystone-manage tasks

    keystone-manage db_sync fails permanently without any
    chance to recover via additional puppet runs if it cannot
    complete successfully on the first run. Adding retries
    reduces the likelihood that deployment fails.

Change-Id: Ie6c31d7c1d0f8be331c5cc878328eea630e57e0c
Closes-Bug: #1592819

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-16: Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/330447

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2016-06-20:

This bug requires a release note because it was not able to be merged for 9.0 release.

Release note text: In some circumstances keystone-manage db_sync fails because the database is unavailable temporarily. Because of how keystone-manage db_sync is triggered, it can never be re-attempted via repeated deploy attempts. The only workaround is to reset your environment and repeat deployment. This will not affect scale up/down scenarios. Its scope is limited to initial deployment failure.

tags:

added: release-notes
removed: need-info

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2016-06-21:

#10

This is still occurring, the community bvt #266 https://ci.fuel-infra.org/job/10.0-community.main.ubuntu.bvt_2/266/ hit this and the retries add new problems where the tables already exist.

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-27: Related fix merged to fuel-library (master)

#11

Reviewed: https://review.openstack.org/330254
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=26f1690d10a5c8f401033dff00166c14bf77d4ac
Submitter: Jenkins
Branch: master

commit 26f1690d10a5c8f401033dff00166c14bf77d4ac
Author: Alex Schultz <email address hidden>
Date: Wed Jun 15 16:25:16 2016 -0600

Stop looking for master once latest seqno found

    This change updates how we look for a master by stopping once we have
    found a service with the largest seqno. Previously if all servers
    had the same seqno then we would return the last server as a master
    rather than the first. This had the side effect that during bootstrap
    when the first server was the original master, it would be demoted for
    each new server that was added to the group. This should prevent the
    ocf script from shuffling masters if they have the same seqno. We will
    always pick the first server rather than the last.

    Change-Id: Iacbd2e2ec403985a1ff52880669b1bec62dbbaba
    Closes-Bug: #1592401
    Related-Bug: #1592819

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-05: Related fix proposed to fuel-library (stable/mitaka)

#12

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/337717

Matthew Mosesohn (raytrac3r) on 2016-07-13

Changed in fuel:
assignee:	Matthew Mosesohn (raytrac3r) → Alex Schultz (alex-schultz)

Alex Schultz (alex-schultz) on 2016-07-13

Changed in fuel:
status:	Confirmed → Fix Committed

Dmitry Pyzhov (dpyzhov) on 2016-07-20

tags:

added: 9.1-proposed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-28: Related fix merged to fuel-library (stable/mitaka)

#13

Reviewed: https://review.openstack.org/337717
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=878ad54a04ff5bcb0b8a2650b55ac3f1498dca76
Submitter: Jenkins
Branch: stable/mitaka

commit 878ad54a04ff5bcb0b8a2650b55ac3f1498dca76
Author: Alex Schultz <email address hidden>
Date: Wed Jun 15 16:25:16 2016 -0600

Stop looking for master once latest seqno found

    Change-Id: Iacbd2e2ec403985a1ff52880669b1bec62dbbaba
    Closes-Bug: #1592401
    Related-Bug: #1592819
    (cherry picked from commit 26f1690d10a5c8f401033dff00166c14bf77d4ac)

tags:

added: in-stable-mitaka

Revision history for this message

Sergey Shevorakov (sshevorakov) wrote on 2016-08-16:

#14

Not a swarm-blocker anymore.
Last run where it appeared: https://mirantis.testrail.com/index.php?/plans/view/13488

tags:

removed: swarm-blocker

Revision history for this message

Maksim Malchuk (mmalchuk) wrote on 2016-09-07:

#15

moved to the invalid status, feel free to reopen if it appears again.

Maria Zlatkova (mzlatkova) on 2016-09-09

tags:

added: release-notes-done
removed: release-notes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-23: Fix merged to fuel-library (stable/mitaka)

#16

Reviewed: https://review.openstack.org/330447
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=fb94965a9a3fe0f4845c917ce770a54b040442e6
Submitter: Jenkins
Branch: stable/mitaka

commit fb94965a9a3fe0f4845c917ce770a54b040442e6
Author: Matthew Mosesohn <email address hidden>
Date: Wed Jun 15 18:47:01 2016 +0300

Add retries for keystone-manage tasks

    Change-Id: Ie6c31d7c1d0f8be331c5cc878328eea630e57e0c
    Closes-Bug: #1592819
    (cherry picked from commit 421bf72dc2e65fa2afcb26c1234e44b59e127ccc)