swift-ring-builder write_builder does not respect fractional replica count, always sets int value

Bug #1677547 reported by Alistair Coles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Low
Unassigned

Bug Description

If you have a ring file with fractional replica count and use swif-ring-builder write_builder to create a buider file for that ring, the builder will have ceil(fractional replica count) replicas.

We should at least document/warn that this happens.

Example:

Create ring with 3.2 replicas...

swift@vm-16:~/rings$ swift-ring-builder fractional.builder create 6 3.2 0
swift@vm-16:~/rings$ swift-ring-builder fractional.builder add r1z1-127.0.0.1:6201/sda 100
Device d0r1z1-127.0.0.1:6201R127.0.0.1:6201/sda_"" with 100.0 weight got id 0
swift@vm-16:~/rings$ swift-ring-builder fractional.builder add r1z1-127.0.0.1:6201/sdb 100
Device d1r1z1-127.0.0.1:6201R127.0.0.1:6201/sdb_"" with 100.0 weight got id 1
swift@vm-16:~/rings$ swift-ring-builder fractional.builder add r1z1-127.0.0.1:6201/sdc 100
Device d2r1z1-127.0.0.1:6201R127.0.0.1:6201/sdc_"" with 100.0 weight got id 2
swift@vm-16:~/rings$ swift-ring-builder fractional.builder add r1z1-127.0.0.1:6201/sdd 100
Device d3r1z1-127.0.0.1:6201R127.0.0.1:6201/sdd_"" with 100.0 weight got id 3
swift@vm-16:~/rings$ swift-ring-builder fractional.builder add r1z1-127.0.0.1:6201/sde 100
Device d4r1z1-127.0.0.1:6201R127.0.0.1:6201/sde_"" with 100.0 weight got id 4
swift@vm-16:~/rings$ swift-ring-builder fractional.builder rebalance
Reassigned 48 (75.00%) partitions. Balance is now 2.34. Dispersion is now 0.00

Write ring file, check ring state...

swift@vm-16:~/rings$ swift-ring-builder fractional.builder write_ring
swift@vm-16:~/rings$ swift-ring-builder fractional.builder
fractional.builder, build version 6
64 partitions, 3.200000 replicas, 1 regions, 1 zones, 5 devices, 2.34 balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 0 (0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file fractional.builder.bak.ring.gz not found, probably it hasn't been written yet
Devices: id region zone ip address:port replication ip:port name weight partitions balance flags meta
            0 1 1 127.0.0.1:6201 127.0.0.1:6201 sda 100.00 41 0.10
            1 1 1 127.0.0.1:6201 127.0.0.1:6201 sdb 100.00 41 0.10
            2 1 1 127.0.0.1:6201 127.0.0.1:6201 sdc 100.00 41 0.10
            3 1 1 127.0.0.1:6201 127.0.0.1:6201 sdd 100.00 41 0.10
            4 1 1 127.0.0.1:6201 127.0.0.1:6201 sde 100.00 40 -2.34

"Lose" the builder file...

swift@vm-16:~/rings$ mv fractional.builder fractional.builder.bak

Write a replacement builder file...

swift@vm-16:~/rings$ swift-ring-builder fractional.ring.gz write_builder
Note: using fractional.builder instead of fractional.ring.gz as builder file
WARNING: default min_part_hours may not match the value in the lost builder.

...which will have 4 replicas, not 3.2...

swift@vm-16:~/rings$ swift-ring-builder fractional.builder
fractional.builder, build version 0
64 partitions, 4.000000 replicas, 1 regions, 1 zones, 5 devices, 21.88 balance
The minimum number of hours before a partition can be reassigned is 24 (0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file fractional.ring.gz is up-to-date
Devices: id region zone ip address:port replication ip:port name weight partitions balance flags meta
            0 1 1 127.0.0.1:6201 127.0.0.1:6201 sda 100.00 41 -19.92
            1 1 1 127.0.0.1:6201 127.0.0.1:6201 sdb 100.00 41 -19.92
            2 1 1 127.0.0.1:6201 127.0.0.1:6201 sdc 100.00 41 -19.92
            3 1 1 127.0.0.1:6201 127.0.0.1:6201 sdd 100.00 41 -19.92
            4 1 1 127.0.0.1:6201 127.0.0.1:6201 sde 100.00 40 -21.88

Revision history for this message
Tim Burke (1-tim-z) wrote :

Makes perfect sense given how we write our ring metadata [1]. Would it be enough to fix write_ring, or do we also want to fix write_builder to eat old rings better?

[1] https://github.com/openstack/swift/blob/2.13.0/swift/common/ring/ring.py#L126

Revision history for this message
clayg (clay-gerrard) wrote :

IMHO it'd be plenty good enough to improve write_builder more - or even just blow up

if len(set(len(p) for p in replica2part2dev))) != 1:
    raise ValueError('write_builder does not support fractional replicas')

As I've said, this isn't so much for protection against lost builders (YOU SHOULD HAVE BACKUPS OF YOUR BUILDERS) it is just a convenience.

IME fractional replicas are not a long term situation - so there's a marginal chance no one would even notice write_builder doesn't support this weird transitional period of a ring (they hadn't noticed this far?)

Changed in swift:
status: New → Confirmed
importance: Undecided → Low
tags: added: low-hanging-fruit
Revision history for this message
Tim Burke (1-tim-z) wrote :

I noticed something else strange in that output: the initial rebalance says

  Reassigned 48 (75.00%) partitions.

But sure looks to me like we assigned 204 partitions...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by Tim Burke (<email address hidden>) on branch: master
Review: https://review.openstack.org/460302
Reason: Superseded by https://review.openstack.org/#/c/503454/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/503454
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=89a5c9d56fda8282292253d2d056121694ef7654
Submitter: Zuul
Branch: master

commit 89a5c9d56fda8282292253d2d056121694ef7654
Author: Tim Burke <email address hidden>
Date: Tue Sep 12 22:46:14 2017 +0000

    Disallow fractional replicas in EC policies

    Change-Id: I873d7bf7de54e4b1dccdafc8a61f03c09a65dfbc
    Closes-Bug: 1554391
    Closes-Bug: 1677547

Changed in swift:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.17.0

This issue was fixed in the openstack/swift 2.17.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/deep)

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/541233

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/deep)
Download full text (12.3 KiB)

Reviewed: https://review.openstack.org/541233
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e473955900dde31dc170bf485e718fdf65955650
Submitter: Zuul
Branch: feature/deep

commit d9eec63a588e1e1e574b71d67b5d7456c3ad468d
Author: James E. Blair <email address hidden>
Date: Wed Jan 24 17:05:13 2018 -0800

    Zuul: Remove project name

    Zuul no longer requires the project-name for in-repo configuration.
    Omitting it makes forking or renaming projects easier.

    Change-Id: I2226881d18e69bf25ad93ee8b8db67248e14f697

commit f4cfe81e593b52e11e67916073a050cd2dde2e00
Author: John Dickinson <email address hidden>
Date: Tue Jan 16 14:51:29 2018 -0800

    authors/changelog updates for 2.17.0 release

    Change-Id: I577d169022916676a20a9ac24c7cc7b63ae46778

commit 8140b7e7ad8eaef0ca110c0f5e185a046b69fd6f
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Thu Feb 1 09:43:46 2018 +0000

    Fix inconsistency of account info in expirer's unit tests

    In expirer's unit tests, FakeInternalClient instances simulates
    expirer's task queue behavior. But get_account_info method of
    the FakeInternalClient returns container count = 1 and object
    count = 2, even if it simulate different count of containers or
    objects.

    This patch fixes the behavior. The return values of get_account_info
    will be equal to simulated container and object counts.

    This patch will make review for expirer's task queue upgrade patch [1]
    more easy.

    [1]: https://review.openstack.org/#/c/517389

    Change-Id: Id5339ea7e10e4577ff22daeb91ec90f08704c98d

commit 924a043c70ba38c5d81758b5e0c29fa73674404a
Author: Samuel Merritt <email address hidden>
Date: Wed Jan 31 16:40:21 2018 -0800

    Remove some cruft from ratelimit tests

    The tests were carefully setting up a mock for the 'http_connect'
    function in the ratelimit module, but there is no such function
    imported by the ratelimit module.

    As far as I can tell, this has been the case since the ratelimit
    middleware first appeared in 72d40bd (Mon Oct 4 14:11:48 2010 -0700).

    Change-Id: If047184c6435aa1647050f50b499dc9feff4318d

commit 5cac37284e1d3f3019309ca86f73e3dbb176df59
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Thu Jan 25 10:49:00 2018 +0000

    Refactor expirer unit tests

    In expirer's unit tests, fake InternalClient classes are defined
    and its instance simulates expirer's task queue behaviors.

    To make review for expirer's task queue update patch [1] easy,
    this patch refactors the implementation of the fake InternalClient
    classes. In this patch, unit tests are refactored by the following
    two approaches:
        #1: Summarizing duplicated fake InternalClient implementation
        #2: Make task account name variable
    The #2 approach is for multiple task accounts in [1].

    The patch [1] will be rebased after this patch merged.

    [1]: https://review.openstack.org/#/c/517389

    Change-Id: I10a7151cfdd43460ad38c47f672d3c31b77e7990

commit 1c306660a53447bb0a696873bd3f88e66e3781eb
Author: OpenStack Proposal Bot <openstack-...

tags: added: in-feature-deep
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/s3api)

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/548052

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/548193

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/s3api)

Change abandoned by Kota Tsuyuzaki (<email address hidden>) on branch: feature/s3api
Review: https://review.openstack.org/548052
Reason: This is affected by the new s3api functests added in recent. Use https://review.openstack.org/#/c/548193/ instead.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/s3api)
Download full text (30.7 KiB)

Reviewed: https://review.openstack.org/548193
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=9d43f4e190802cb0cad507a65245e4dd70fda7db
Submitter: Zuul
Branch: feature/s3api

commit b3f1558acd2f5a4df3f039070ca5bbc33935e822
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Tue Feb 13 05:16:27 2018 +0000

    Fix expirer's invalid task object names in unit tests

    Object-expirer's task name should be in format of
    "<timestamp>-<account>/<container>/<obj>". In object-expirer
    implementation, ValueError is catched and handled when expirer's task
    objects have invalid name. But in actual swift cluster, invalid task
    object name is not created because task object is created by
    object-server.
    However, without the ValueError catching, some unit tests fail,
    because the unit tests create invalid task object names.

    This patch fixes invalid task object names in unit tests. The
    ValueError catch is remained for unexpected errors, but in the case
    the task will be skipped.

    This patch will help to refactor expirer's task object parsing.

    Change-Id: I8fab8fd180481ce9e97c945904c5c89eec037110

commit 4b19ac772364778a4b96d7e18834db9a7645f482
Author: Tim Burke <email address hidden>
Date: Thu Feb 1 14:30:19 2018 -0800

    py3: port common/storage_policy.py

    Change-Id: I7030280a8495628df9ed8edcc8abc31f901da72e

commit 25540a415e7e36bb08a01a14ca41e2d7591e62cb
Author: Tim Burke <email address hidden>
Date: Thu Feb 22 11:08:49 2018 -0800

    Tighten up assertions around expirer's concurrency

    In particular, test that each work item is only done *once*.

    Change-Id: I9cc610bffb2aa9a2f2b05f4c49e574ab56d05201
    Related-Change: Ic0075a3718face8c509ed0524b63d9171f5b7d7a

commit 5017864133b5af289f205afaf76ffe4518688b3f
Author: melissaml <ma.lei@99cloud.net>
Date: Mon Feb 26 15:48:31 2018 +0800

    Fix the incorrect reference links

    TrivialFix
    [1] is the installation guide for OpenStack components, obviously,
    we need [1] in the docs.

    [1] https://docs.openstack.org/latest/install/

    Change-Id: I3c6fe7327f5552cc2b8f0f0e42b41f8e989a0a7e

commit 58f5d89066adbd54403ad315ffe1f9b2f05aa583
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Tue Feb 13 03:36:04 2018 +0000

    Remove confusing assertion from expirer's unit test

    In test_expirer.TestObjectExpirer.test_process_based_concurrency,
    an assertion checks that expirer execute tasks in round-robin order
    for target containers. But the assertion depends on task object path,
    because task assignation for each process depends on md5 of task
    object path. The dependency makes the assetion confusing.

    Now, we have test_expirer.TestObjectExpirer.test_round_robin_order which
    is added in [1]. So this patch remove the confusing assertion.

    This patch will help to refactor expirer's task object parsing.
    I will push patch for the refactoring after this patch.

    [1]: https://review.openstack.org/#/c/538171

    Change-Id: Ic0075a3718face8c509ed0524b63d9171f5b7d7a

commit 6060af8db96e23d32f3ecc3d02f7f...

tags: added: in-feature-s3api
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.