Swift Ringbuilder rebalance fails

Bug #1305826 reported by Ivan Berezovskiy on 2014-04-10
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Vladimir Kuklin
4.1.x
High
Fuel Library (Deprecated)
5.0.x
High
Fuel Library (Deprecated)

Bug Description

{"build_id": "2014-04-07_15-04-24", "mirantis": "yes", "build_number": "261", "nailgun_sha": "5efa9d21162cd30394f2f608641c324a80ea43dd", "ostf_sha": "17f2fe6e56452f8e2f01a385be4c4b87bf3698a8", "fuelmain_sha": "0ed7471818bc50699d33e217c28114234c08c8ee", "astute_sha": "183fe05cd59a5ce6d154fa263a5a8bf5f27db0ec", "release": "5.0", "fuellib_sha": "4d942716e3cc8b40ad868d0c4836c6010fbdc42e"}

To reproduce:
1) Deploy HA with 1 controller node:
* Centos or Ubuntu, Neutron+GRE, tagged all
* Savanna, Murano, Ceilometer
* Roles: Controller, Cinder LVM

Swift errors:

ERR
 (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) change from notrun to 0 1 failed: swift-ring-builder /etc/swift/object.builder rebalance returned 2 instead of one of [0,1]

TRACE:
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) An error has occurred during ring validation. Common
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) causes of failure are rings that are empty or do not
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) have enough devices to accommodate the replica count.
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) Original exception message:
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) There are no devices in this ring, or all devices have been deleted
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns)

Also then there is problem in starting Service[swift-object-replicator]:

(/Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Service[swift-object-replicator]/ensure) change from stopped to running failed: Could not start Service[swift-object-replicator]: Execution of '/usr/bin/swift-init object-replicator start' returned 1:

tags: added: icehouse
Changed in fuel:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 5.0
tags: added: isehouse
removed: icehouse
tags: added: icehouse
removed: isehouse
Sergey Vasilenko (xenolog) wrote :

I can't reproduce it on Centos.

Vladimir Kuklin (vkuklin) wrote :

This was somehow magically reproduced in Ubuntu, but neither redeployment of the same environment, nor manual rebuilding of swift rings with the same paremeters did not reintroduce the issue.

The only common between these 2 environments is that there is only one controller for each environment. In this case it
may end with this error.

Changed in fuel:
status: Confirmed → Incomplete
Anastasia Palkina (apalkina) wrote :
Vladimir Kuklin (vkuklin) wrote :
Download full text (3.4 KiB)

Confirmed and Triaged:

This happens when rebalance happens earlier than devices are added to the ring. It affects only the speed of deployment as puppet will retry and the second attempt will succeed.

<31>Apr 16 11:56:53 node-7 puppet-user[1252]: Executing '/bin/sh -c swift-ring-builder /etc/swift/account.builder rebalance'
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) ---------------------
----------------------------------------------------------
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) An error has occurred
 during ring validation. Common
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) causes of failure are
 rings that are empty or do not
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) have enough devices t
o accommodate the replica count.
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) Original exception me
ssage:
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) There are no devices
 in this ring, or all devices have been deleted
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) ---------------------
----------------------------------------------------------
<27>Apr 16 11:56:53 node-7 puppet-user[1252]: swift-ring-builder /etc/swift/account.builder rebalance returned 2 instead of one of [0,1]

<31>Apr 16 11:56:54 node-7 puppet-user[1252]: Prefetching swift_ring_builder resources for ring_account_device
<31>Apr 16 11:56:54 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<29>Apr 16 11:56:54 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) node name: 192.168.1.4:6002
<29>Apr 16 11:56:54 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) available devs: ["1", "2"]
<31>Apr 16 11:56:54 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<31>Apr 16 11:56:55 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<31>Apr 16 11:56:55 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<29>Apr 16 11:56:55 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) *** create device: 2
<31>Apr 16 11:56:55 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder add z7-192.168.1.4:6002/2 1'
<29>Apr 16 11:56:56 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) *** create ...

Read more...

Changed in fuel:
status: Incomplete → Triaged
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Kuklin (vkuklin)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/88332
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=95576e82c77981d832dac299bf22930719f22bb2
Submitter: Jenkins
Branch: master

commit 95576e82c77981d832dac299bf22930719f22bb2
Author: Vladimir Kuklin <email address hidden>
Date: Wed Apr 16 21:34:47 2014 +0400

    Fix swift-ring-builder ordering

    Tie ring_*_device resource to ring_devices resource
    by setting autorequire, thus making ring_devices resource
    run after ring_*_device and ensure it is before rebalance
    anchor

    How to test:

    1) install swift binaries

    2) run:
    rm /var/lib/puppet/state/graphs/* && puppet apply --graph -vd
    --modulepath /home/vvk/git/fuel/deployment/puppet/ -e
    'anchor{"rebalance_begin":} Ring_devices<||> ->
    Anchor['rebalance_begin'] ring_devices{'all': storages=>
    [{"storage_address"=>"10.10.10.10"}] }'

    3) read /var/lib/puppet/state/graphs/relationships.dot

    The purpose is to set all ring_*_device before
    Anchor['rebalance_begin']

    Change-Id: If9e724342eeeb8399093103a875719b82a429bc5
    Closes-Bug: #1305826

Changed in fuel:
status: In Progress → Fix Committed
Artem Panchenko (apanchenko-8) wrote :

Verified on ISO # 147

api: '1.0'
astute_sha: 3cffebde1e5452f5dbf8f744c6525fc36c7afbf3
build_id: 2014-04-28_01-00-26
build_number: '147'
fuellib_sha: 10d2775b21102b1dcda00e355cc364895d230236
fuelmain_sha: 68bed4f785ff045be6ef16de19213c9a36a42fce
mirantis: 'yes'
nailgun_sha: 4ad5fe676fd7acac83310e97bc84cfa30f8418b4
ostf_sha: 134765fcb5a07dce0cd1bb399b2290c988c3c63b
production: prod
release: '5.0'

Changed in fuel:
status: Fix Committed → Fix Released
no longer affects: fuel/5.0.x

Reviewed: https://review.openstack.org/96845
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=d3b4818b7b3ded1e7b52781efb7328ec4d0d17de
Submitter: Jenkins
Branch: stable/4.1

commit d3b4818b7b3ded1e7b52781efb7328ec4d0d17de
Author: Vladimir Kuklin <email address hidden>
Date: Wed Apr 16 21:34:47 2014 +0400

    Fix swift-ring-builder ordering

    Tie ring_*_device resource to ring_devices resource
    by setting autorequire, thus making ring_devices resource
    run after ring_*_device and ensure it is before rebalance
    anchor

    How to test:

    1) install swift binaries

    2) run:
    rm /var/lib/puppet/state/graphs/* && puppet apply --graph -vd
    --modulepath /home/vvk/git/fuel/deployment/puppet/ -e
    'anchor{"rebalance_begin":} Ring_devices<||> ->
    Anchor['rebalance_begin'] ring_devices{'all': storages=>
    [{"storage_address"=>"10.10.10.10"}] }'

    3) read /var/lib/puppet/state/graphs/relationships.dot

    The purpose is to set all ring_*_device before
    Anchor['rebalance_begin']

    Change-Id: If9e724342eeeb8399093103a875719b82a429bc5
    Closes-Bug: #1305826

Fix proposed to branch: master
Review: https://review.openstack.org/107749

Changed in fuel:
status: Confirmed → In Progress

Change abandoned by Vladimir Kuklin (<email address hidden>) on branch: stable/5.0
Review: https://review.openstack.org/107767

Reviewed: https://review.openstack.org/107749
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=65014034df4a6e31438fd5414ea0f6207cf3c4dc
Submitter: Jenkins
Branch: master

commit 65014034df4a6e31438fd5414ea0f6207cf3c4dc
Author: Vladimir Kuklin <email address hidden>
Date: Thu Jul 17 19:21:42 2014 +0400

    Change resource generation type

    Use eval_generate to generate swift rings.
    This will ensure that Ring_devices resource
    is completed only when all the Ring_*_devices
    is finished, thus ensuring that rebalance begins
    only after all the devices were created.

    Also pin services start to rebalance end.

    Change-Id: If8c64a565ebb86f545add145934d566d666ef073
    Closes-bug: #1305826

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/107767
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=b115bcfb57b8a99fc68b39f6be19b29b2e174464
Submitter: Jenkins
Branch: stable/5.0

commit b115bcfb57b8a99fc68b39f6be19b29b2e174464
Author: Vladimir Kuklin <email address hidden>
Date: Thu Jul 17 19:21:42 2014 +0400

    Change resource generation type

    Use eval_generate to generate swift rings.
    This will ensure that Ring_devices resource
    is completed only when all the Ring_*_devices
    is finished, thus ensuring that rebalance begins
    only after all the devices were created.

    Also pin services start to rebalance end.

    Change-Id: If8c64a565ebb86f545add145934d566d666ef073
    Closes-bug: #1305826

Dmitry Pyzhov (dpyzhov) on 2014-07-21
no longer affects: fuel/5.1.x

Verified on both Centos and Ubuntu.

{

    "build_id": "2014-08-26_00-01-17",
    "ostf_sha": "907f25f8fad39b177bf6a66fba9785afa7dd8008",
    "build_number": "478",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "44876ddae29823449e0cbc59428aafa466cbbbc2",
    "production": "docker",
    "fuelmain_sha": "74ad3dd68020aac3042f62c59c137498474ecbee",
    "astute_sha": "bc60b7d027ab244039f48c505ac52ab8eb0a990c",
    "feature_groups": [
        "mirantis"
    ],
    "release": "5.1",
    "fuellib_sha": "ca5aa450ea3da771c2d5f1e82511450f0b8faf28"

}

The above version is incorrect. 5.0.2 build 57.

tags: added: swift
Tom Fifield (fifieldt) on 2015-06-11
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers