If keystone-manage db_sync fails first time, deployment cannot proceed

Bug #1592819 reported by Alexandr Kostrikov
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Alex Schultz
Mitaka
Fix Released
High
Alex Schultz

Bug Description

During SWARM run[0] there were an error in separate all services scenario:
            1. Create cluster
            2. Add 3 nodes with controller role
            3. Add 3 nodes with database, keystone, rabbit,
               horizon
            4. Add 1 compute and cinder
            5. Verify networks
            6. Deploy the cluster
            7. Verify networks
            8. Run OSTF

The deployment had failed with error:
AssertionError: Cluster is not deployed: some nodes are in the Error state

On the detached database/keystone/rabbit there were an error which stated that `keystone.project` table does not exit:
http://paste.openstack.org/show/516241/

Puppet logs show that only db_sync has been executed and no bootstrap:
http://paste.openstack.org/show/516242/

There is no calls of bootstrap in manifest:
http://paste.openstack.org/show/516249/

[0] https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.plugins.thread_2_separate_services/143/testReport/(root)/separate_all_service/separate_all_service/

Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :
tags: added: swarm-blocker
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote : Re: No `keystone-manage bootstrap` in openstack_tasks/manifests/keystone/keystone.pp in detached services deployment

Setting it to swarm blocker due to fact that it has blocked two more tests.

Changed in fuel:
assignee: Fuel Toolbox (fuel-toolbox) → Matthew Mosesohn (raytrac3r)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The root cause is that Galera wasn't ready when keystone-manage db_sync ran. We can try to research how to trigger this across puppet applies, but right now it is refreshonly and can't be retriggered easily.

summary: - No `keystone-manage bootstrap` in
- openstack_tasks/manifests/keystone/keystone.pp in detached services
- deployment
+ If keystone-manage db_sync fails first time, deployment cannot proceed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/330025

Changed in fuel:
status: New → In Progress
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I guess this could just be a duplicate of Bug 1592401 since it's probably the same underlying problem. The proposed fix is one workaround but doesn't necessarily address the core issue

no longer affects: fuel/newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/330025
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=421bf72dc2e65fa2afcb26c1234e44b59e127ccc
Submitter: Jenkins
Branch: master

commit 421bf72dc2e65fa2afcb26c1234e44b59e127ccc
Author: Matthew Mosesohn <email address hidden>
Date: Wed Jun 15 18:47:01 2016 +0300

    Add retries for keystone-manage tasks

    keystone-manage db_sync fails permanently without any
    chance to recover via additional puppet runs if it cannot
    complete successfully on the first run. Adding retries
    reduces the likelihood that deployment fails.

    Change-Id: Ie6c31d7c1d0f8be331c5cc878328eea630e57e0c
    Closes-Bug: #1592819

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/330447

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This bug requires a release note because it was not able to be merged for 9.0 release.

Release note text: In some circumstances keystone-manage db_sync fails because the database is unavailable temporarily. Because of how keystone-manage db_sync is triggered, it can never be re-attempted via repeated deploy attempts. The only workaround is to reset your environment and repeat deployment. This will not affect scale up/down scenarios. Its scope is limited to initial deployment failure.

tags: added: release-notes
removed: need-info
Revision history for this message
Alex Schultz (alex-schultz) wrote :

This is still occurring, the community bvt #266 https://ci.fuel-infra.org/job/10.0-community.main.ubuntu.bvt_2/266/ hit this and the retries add new problems where the tables already exist.

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/330254
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=26f1690d10a5c8f401033dff00166c14bf77d4ac
Submitter: Jenkins
Branch: master

commit 26f1690d10a5c8f401033dff00166c14bf77d4ac
Author: Alex Schultz <email address hidden>
Date: Wed Jun 15 16:25:16 2016 -0600

    Stop looking for master once latest seqno found

    This change updates how we look for a master by stopping once we have
    found a service with the largest seqno. Previously if all servers
    had the same seqno then we would return the last server as a master
    rather than the first. This had the side effect that during bootstrap
    when the first server was the original master, it would be demoted for
    each new server that was added to the group. This should prevent the
    ocf script from shuffling masters if they have the same seqno. We will
    always pick the first server rather than the last.

    Change-Id: Iacbd2e2ec403985a1ff52880669b1bec62dbbaba
    Closes-Bug: #1592401
    Related-Bug: #1592819

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/337717

Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Alex Schultz (alex-schultz)
Changed in fuel:
status: Confirmed → Fix Committed
Dmitry Pyzhov (dpyzhov)
tags: added: 9.1-proposed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/337717
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=878ad54a04ff5bcb0b8a2650b55ac3f1498dca76
Submitter: Jenkins
Branch: stable/mitaka

commit 878ad54a04ff5bcb0b8a2650b55ac3f1498dca76
Author: Alex Schultz <email address hidden>
Date: Wed Jun 15 16:25:16 2016 -0600

    Stop looking for master once latest seqno found

    This change updates how we look for a master by stopping once we have
    found a service with the largest seqno. Previously if all servers
    had the same seqno then we would return the last server as a master
    rather than the first. This had the side effect that during bootstrap
    when the first server was the original master, it would be demoted for
    each new server that was added to the group. This should prevent the
    ocf script from shuffling masters if they have the same seqno. We will
    always pick the first server rather than the last.

    Change-Id: Iacbd2e2ec403985a1ff52880669b1bec62dbbaba
    Closes-Bug: #1592401
    Related-Bug: #1592819
    (cherry picked from commit 26f1690d10a5c8f401033dff00166c14bf77d4ac)

tags: added: in-stable-mitaka
Revision history for this message
Sergey Shevorakov (sshevorakov) wrote :

Not a swarm-blocker anymore.
Last run where it appeared: https://mirantis.testrail.com/index.php?/plans/view/13488

tags: removed: swarm-blocker
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

moved to the invalid status, feel free to reopen if it appears again.

tags: added: release-notes-done
removed: release-notes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/330447
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=fb94965a9a3fe0f4845c917ce770a54b040442e6
Submitter: Jenkins
Branch: stable/mitaka

commit fb94965a9a3fe0f4845c917ce770a54b040442e6
Author: Matthew Mosesohn <email address hidden>
Date: Wed Jun 15 18:47:01 2016 +0300

    Add retries for keystone-manage tasks

    keystone-manage db_sync fails permanently without any
    chance to recover via additional puppet runs if it cannot
    complete successfully on the first run. Adding retries
    reduces the likelihood that deployment fails.

    Change-Id: Ie6c31d7c1d0f8be331c5cc878328eea630e57e0c
    Closes-Bug: #1592819
    (cherry picked from commit 421bf72dc2e65fa2afcb26c1234e44b59e127ccc)

Revision history for this message
Alexey. Kalashnikov (akalashnikov) wrote :

Not reproduced on swarm 9.1

tags: added: on-verification
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.