Swift Ringbuilder rebalance fails

Bug #1305826 reported by Ivan Berezovskiy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Vladimir Kuklin
4.1.x
Fix Released
High
Fuel Library (Deprecated)
5.0.x
Fix Released
High
Fuel Library (Deprecated)

Bug Description

{"build_id": "2014-04-07_15-04-24", "mirantis": "yes", "build_number": "261", "nailgun_sha": "5efa9d21162cd30394f2f608641c324a80ea43dd", "ostf_sha": "17f2fe6e56452f8e2f01a385be4c4b87bf3698a8", "fuelmain_sha": "0ed7471818bc50699d33e217c28114234c08c8ee", "astute_sha": "183fe05cd59a5ce6d154fa263a5a8bf5f27db0ec", "release": "5.0", "fuellib_sha": "4d942716e3cc8b40ad868d0c4836c6010fbdc42e"}

To reproduce:
1) Deploy HA with 1 controller node:
* Centos or Ubuntu, Neutron+GRE, tagged all
* Savanna, Murano, Ceilometer
* Roles: Controller, Cinder LVM

Swift errors:

ERR
 (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) change from notrun to 0 1 failed: swift-ring-builder /etc/swift/object.builder rebalance returned 2 instead of one of [0,1]

TRACE:
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) An error has occurred during ring validation. Common
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) causes of failure are rings that are empty or do not
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) have enough devices to accommodate the replica count.
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) Original exception message:
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns) There are no devices in this ring, or all devices have been deleted
<29>Apr 10 12:11:29 node-5 puppet-user[1495]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]/returns)

Also then there is problem in starting Service[swift-object-replicator]:

(/Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Service[swift-object-replicator]/ensure) change from stopped to running failed: Could not start Service[swift-object-replicator]: Execution of '/usr/bin/swift-init object-replicator start' returned 1:

Tags: icehouse swift
tags: added: icehouse
Changed in fuel:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 5.0
tags: added: isehouse
removed: icehouse
tags: added: icehouse
removed: isehouse
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

I can't reproduce it on Centos.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This was somehow magically reproduced in Ubuntu, but neither redeployment of the same environment, nor manual rebuilding of swift rings with the same paremeters did not reintroduce the issue.

The only common between these 2 environments is that there is only one controller for each environment. In this case it
may end with this error.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Download full text (3.4 KiB)

Confirmed and Triaged:

This happens when rebalance happens earlier than devices are added to the ring. It affects only the speed of deployment as puppet will retry and the second attempt will succeed.

<31>Apr 16 11:56:53 node-7 puppet-user[1252]: Executing '/bin/sh -c swift-ring-builder /etc/swift/account.builder rebalance'
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) ---------------------
----------------------------------------------------------
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) An error has occurred
 during ring validation. Common
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) causes of failure are
 rings that are empty or do not
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) have enough devices t
o accommodate the replica count.
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) Original exception me
ssage:
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) There are no devices
 in this ring, or all devices have been deleted
<29>Apr 16 11:56:53 node-7 puppet-user[1252]: (/Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]/returns) ---------------------
----------------------------------------------------------
<27>Apr 16 11:56:53 node-7 puppet-user[1252]: swift-ring-builder /etc/swift/account.builder rebalance returned 2 instead of one of [0,1]

<31>Apr 16 11:56:54 node-7 puppet-user[1252]: Prefetching swift_ring_builder resources for ring_account_device
<31>Apr 16 11:56:54 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<29>Apr 16 11:56:54 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) node name: 192.168.1.4:6002
<29>Apr 16 11:56:54 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) available devs: ["1", "2"]
<31>Apr 16 11:56:54 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<31>Apr 16 11:56:55 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<31>Apr 16 11:56:55 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder'
<29>Apr 16 11:56:55 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) *** create device: 2
<31>Apr 16 11:56:55 node-7 puppet-user[1252]: Executing '/usr/bin/swift-ring-builder /etc/swift/account.builder add z7-192.168.1.4:6002/2 1'
<29>Apr 16 11:56:56 node-7 puppet-user[1252]: (Ring_account_device[192.168.1.4:6002](provider=swift_ring_builder)) *** create ...

Read more...

Changed in fuel:
status: Incomplete → Triaged
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Kuklin (vkuklin)
status: Triaged → In Progress
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/88332
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=95576e82c77981d832dac299bf22930719f22bb2
Submitter: Jenkins
Branch: master

commit 95576e82c77981d832dac299bf22930719f22bb2
Author: Vladimir Kuklin <email address hidden>
Date: Wed Apr 16 21:34:47 2014 +0400

    Fix swift-ring-builder ordering

    Tie ring_*_device resource to ring_devices resource
    by setting autorequire, thus making ring_devices resource
    run after ring_*_device and ensure it is before rebalance
    anchor

    How to test:

    1) install swift binaries

    2) run:
    rm /var/lib/puppet/state/graphs/* && puppet apply --graph -vd
    --modulepath /home/vvk/git/fuel/deployment/puppet/ -e
    'anchor{"rebalance_begin":} Ring_devices<||> ->
    Anchor['rebalance_begin'] ring_devices{'all': storages=>
    [{"storage_address"=>"10.10.10.10"}] }'

    3) read /var/lib/puppet/state/graphs/relationships.dot

    The purpose is to set all ring_*_device before
    Anchor['rebalance_begin']

    Change-Id: If9e724342eeeb8399093103a875719b82a429bc5
    Closes-Bug: #1305826

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Verified on ISO # 147

api: '1.0'
astute_sha: 3cffebde1e5452f5dbf8f744c6525fc36c7afbf3
build_id: 2014-04-28_01-00-26
build_number: '147'
fuellib_sha: 10d2775b21102b1dcda00e355cc364895d230236
fuelmain_sha: 68bed4f785ff045be6ef16de19213c9a36a42fce
mirantis: 'yes'
nailgun_sha: 4ad5fe676fd7acac83310e97bc84cfa30f8418b4
ostf_sha: 134765fcb5a07dce0cd1bb399b2290c988c3c63b
production: prod
release: '5.0'

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/4.1)

Fix proposed to branch: stable/4.1
Review: https://review.openstack.org/96845

no longer affects: fuel/5.0.x
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/4.1)

Reviewed: https://review.openstack.org/96845
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=d3b4818b7b3ded1e7b52781efb7328ec4d0d17de
Submitter: Jenkins
Branch: stable/4.1

commit d3b4818b7b3ded1e7b52781efb7328ec4d0d17de
Author: Vladimir Kuklin <email address hidden>
Date: Wed Apr 16 21:34:47 2014 +0400

    Fix swift-ring-builder ordering

    Tie ring_*_device resource to ring_devices resource
    by setting autorequire, thus making ring_devices resource
    run after ring_*_device and ensure it is before rebalance
    anchor

    How to test:

    1) install swift binaries

    2) run:
    rm /var/lib/puppet/state/graphs/* && puppet apply --graph -vd
    --modulepath /home/vvk/git/fuel/deployment/puppet/ -e
    'anchor{"rebalance_begin":} Ring_devices<||> ->
    Anchor['rebalance_begin'] ring_devices{'all': storages=>
    [{"storage_address"=>"10.10.10.10"}] }'

    3) read /var/lib/puppet/state/graphs/relationships.dot

    The purpose is to set all ring_*_device before
    Anchor['rebalance_begin']

    Change-Id: If9e724342eeeb8399093103a875719b82a429bc5
    Closes-Bug: #1305826

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/107749

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/107767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/5.0)

Change abandoned by Vladimir Kuklin (<email address hidden>) on branch: stable/5.0
Review: https://review.openstack.org/107767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/107749
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=65014034df4a6e31438fd5414ea0f6207cf3c4dc
Submitter: Jenkins
Branch: master

commit 65014034df4a6e31438fd5414ea0f6207cf3c4dc
Author: Vladimir Kuklin <email address hidden>
Date: Thu Jul 17 19:21:42 2014 +0400

    Change resource generation type

    Use eval_generate to generate swift rings.
    This will ensure that Ring_devices resource
    is completed only when all the Ring_*_devices
    is finished, thus ensuring that rebalance begins
    only after all the devices were created.

    Also pin services start to rebalance end.

    Change-Id: If8c64a565ebb86f545add145934d566d666ef073
    Closes-bug: #1305826

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.0)

Reviewed: https://review.openstack.org/107767
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=b115bcfb57b8a99fc68b39f6be19b29b2e174464
Submitter: Jenkins
Branch: stable/5.0

commit b115bcfb57b8a99fc68b39f6be19b29b2e174464
Author: Vladimir Kuklin <email address hidden>
Date: Thu Jul 17 19:21:42 2014 +0400

    Change resource generation type

    Use eval_generate to generate swift rings.
    This will ensure that Ring_devices resource
    is completed only when all the Ring_*_devices
    is finished, thus ensuring that rebalance begins
    only after all the devices were created.

    Also pin services start to rebalance end.

    Change-Id: If8c64a565ebb86f545add145934d566d666ef073
    Closes-bug: #1305826

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Verified on both Centos and Ubuntu.

{

    "build_id": "2014-08-26_00-01-17",
    "ostf_sha": "907f25f8fad39b177bf6a66fba9785afa7dd8008",
    "build_number": "478",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "44876ddae29823449e0cbc59428aafa466cbbbc2",
    "production": "docker",
    "fuelmain_sha": "74ad3dd68020aac3042f62c59c137498474ecbee",
    "astute_sha": "bc60b7d027ab244039f48c505ac52ab8eb0a990c",
    "feature_groups": [
        "mirantis"
    ],
    "release": "5.1",
    "fuellib_sha": "ca5aa450ea3da771c2d5f1e82511450f0b8faf28"

}

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

The above version is incorrect. 5.0.2 build 57.

tags: added: swift
Tom Fifield (fifieldt)
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.