HA deployment failed after adding two new controllers with Failed to call refresh: swift-ring-builder /etc/swift/account.builder rebalance returned 1 instead of one of [0]

Bug #1402701 reported by Andrey Sledzinskiy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Aleksandr Didenko
6.0.x
Fix Released
Critical
Aleksandr Didenko
6.1.x
Fix Released
Critical
Aleksandr Didenko

Bug Description

{

    "build_id": "2014-12-09_22-41-06",
    "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4",
    "build_number": "49",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1",
    "production": "docker",
    "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee",
    "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.0",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "build_id": "2014-12-09_22-41-06",
                "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4",
                "build_number": "49",
                "api": "1.0",
                "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1",
                "production": "docker",
                "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee",
                "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.0",
                "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"
            }
        }
    },
    "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"

}

Steps:
1. Create cluster - HA, CentOS, Flat nova-network, 1 controller
2. Deploy cluster
3. Add 2 new controllers
4. Deploy changes
5. Add 2 new controllers
6. Start cluster re-deployment

Actual - deployment failed on node-2 with
2014-12-15 06:51:43 ERR

 (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]) Failed to call refresh: swift-ring-builder /etc/swift/account.builder rebalance returned 1 instead of one of [0]

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

-----------
2014-12-15T06:20:44 info: [410] Processing RPC call 'deploy'
... deploying 1st primary controller ...
2014-12-15T06:30:57.405142+00:00 debug: Executing 'swift-ring-builder /etc/swift/account.builder rebalance'
2014-12-15T06:30:57.894101+00:00 notice: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) Reassigned 512 (100.00%) partitions. Balance is now 0.00.

------------
2014-12-15T06:46:14 info: [418] Processing RPC call 'deploy'
... deploying primary controller +2 new controllers ...
2014-12-15T06:51:43.441763+00:00 debug: Executing 'swift-ring-builder /etc/swift/account.builder rebalance'
2014-12-15T06:51:43.930819+00:00 notice: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) No partitions could be reassigned.
2014-12-15T06:51:43.932191+00:00 notice: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) Either none need to be or none can be due to min_part_hours [1].

------------
2014-12-15T07:29:33 info: [405] Processing RPC call 'deploy'
... deploying primary conrtoller, +2 controllers, +2 new controllers ...
2014-12-15T07:35:28.645214+00:00 debug: Executing 'swift-ring-builder /etc/swift/account.builder rebalance'
2014-12-15T07:35:29.196341+00:00 notice: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) Reassigned 512 (100.00%) partitions. Balance is now 234.64.

We set min_part_hours=1, so it is now allowed to rebalance swift ring more often than once per hour. So in the end we can see this in puppet logs for node-2:

2014-12-15T07:35:30.475418+00:00 notice: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) NOTE: Balance of 252.86 indicates you should push this
2014-12-15T07:35:30.476298+00:00 notice: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) ring, wait at least 1 hours, and rebalance/repush.

We should adapt our scalability system tests to not add new controllers more often than once per hour on environments with swift. We also might need to update our documentation about it.

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

OK, it looks like "swift-ring-builder /etc/swift/${name}.builder pretend_min_part_hours_passed" exec is missing after swift module sync with upstream.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/141827

Changed in fuel:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/141828

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/6.0)

Reviewed: https://review.openstack.org/141828
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f81afbbd7075967ce265ac169a89180da715027c
Submitter: Jenkins
Branch: stable/6.0

commit f81afbbd7075967ce265ac169a89180da715027c
Author: Aleksandr Didenko <email address hidden>
Date: Mon Dec 15 17:48:09 2014 +0200

    Add pretend_min_part_hours_passed before rebalance

    Add pretend_min_part_hours_passed before rebalancing swift ring.
    Make sure some dirs are created with proper ownership.

    Closes-bug: #1402701
    Change-Id: Id49a6ee1852e5d65a9994802c5949eac26dc1821

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/141827
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=cc063caf870bfdce46d8c2afe0837e83e8673696
Submitter: Jenkins
Branch: master

commit cc063caf870bfdce46d8c2afe0837e83e8673696
Author: Aleksandr Didenko <email address hidden>
Date: Mon Dec 15 17:48:09 2014 +0200

    Add pretend_min_part_hours_passed before rebalance

    Add pretend_min_part_hours_passed before rebalancing swift ring.
    Make sure some dirs are created with proper ownership.

    Closes-bug: #1402701
    Change-Id: Id49a6ee1852e5d65a9994802c5949eac26dc1821

Revision history for this message
Mike Scherbakov (mihgen) wrote :

I assume it's done for 6.1?

tags: added: on-verification
tags: removed: on-verification
tags: added: on-verification
Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Verified on
{"build_id": "2014-12-18_01-32-01",
"ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4",
"build_number": "56",
"auth_required": true, "api": "1.0",
"nailgun_sha": "5f91157daa6798ff522ca9f6d34e7e135f150a90",
"production": "docker",
"fuelmain_sha": "45caacadb878abfbd9d60e134d72229698b469c9",
"astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91",
"feature_groups": ["mirantis"], "release": "6.0",
"release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-18_01-32-01",
"ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4",
"build_number": "56",
"api": "1.0",
"nailgun_sha": "5f91157daa6798ff522ca9f6d34e7e135f150a90",
"production": "docker",
"fuelmain_sha": "45caacadb878abfbd9d60e134d72229698b469c9",
"astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91",
"feature_groups": ["mirantis"], "release": "6.0",
"fuellib_sha": "73332192a257ea02c40a39885c502ad1ebdf3eda"}}}, "fuellib_sha": "73332192a257ea02c40a39885c502ad1ebdf3eda"}

Deploy at every time was successful.

tags: removed: on-verification
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

ISO version: {u'build_id': u'2014-12-18_01-32-01', u'ostf_sha': u'a9afb68710d809570460c29d6c3293219d3624d4', u'build_number': u'56', u'auth_required': True, u'nailgun_sha': u'5f91157daa6798ff522ca9f6d34e7e135f150a90', u'production': u'docker', u'api': u'1.0', u'fuelmain_sha': u'45caacadb878abfbd9d60e134d72229698b469c9', u'astute_sha': u'16b252d93be6aaa73030b8100cf8c5ca6a970a91', u'feature_groups': [u'mirantis'], u'release': u'6.0', u'release_versions': {u'2014.2-6.0': {u'VERSION': {u'build_id': u'2014-12-18_01-32-01', u'ostf_sha': u'a9afb68710d809570460c29d6c3293219d3624d4', u'build_number': u'56', u'api': u'1.0', u'nailgun_sha': u'5f91157daa6798ff522ca9f6d34e7e135f150a90', u'production': u'docker', u'fuelmain_sha': u'45caacadb878abfbd9d60e134d72229698b469c9', u'astute_sha': u'16b252d93be6aaa73030b8100cf8c5ca6a970a91', u'feature_groups': [u'mirantis'], u'release': u'6.0', u'fuellib_sha': u'73332192a257ea02c40a39885c502ad1ebdf3eda'}}}, u'fuellib_sha': u'73332192a257ea02c40a39885c502ad1ebdf3eda'}

Changed in puppet-swift:
assignee: nobody → Aleksandr Didenko (adidenko)
status: New → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to puppet-modules/puppet-swift (mos-8.0)

Fix proposed to branch: mos-8.0
Change author: Alexander Didenko <email address hidden>
Review: https://review.fuel-infra.org/10990

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to puppet-modules/puppet-swift (mos-8.0)

Reviewed: https://review.fuel-infra.org/10990
Submitter: Denis Egorenko <email address hidden>
Branch: mos-8.0

Commit: 02ed7487ca6a83671ac5caca16987c3b07aa094a
Author: Alexander Didenko <email address hidden>
Date: Thu Sep 17 13:44:09 2015

Make ring rebalance more reliable.

- Sometimes it's needed to rebalance ring before min_part_hours
  passes. So we need to run 'pretend_min_part_hours_passed'
  command before executing rebalance.
- Ring rebalance may exit with '1' exit code if no changes are
  needed in the ring balance. It's normal situation and we should
  not fail puppet catalog execution due to it.
- Directory '/etc/swift/backups' is needed for rebalance.

Closes-bug: #1402701
Change-Id: I6bb1a144d1a5189ca32d407c6a7497aa397fafc5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-swift (master)

Change abandoned by Aleksandr Didenko (<email address hidden>) on branch: master
Review: https://review.openstack.org/198695
Reason: Abandoned

Changed in puppet-swift:
assignee: Aleksandr Didenko (adidenko) → nobody
no longer affects: puppet-swift
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.