Bug #1396152 “container partitions directories never removed” : Bugs : OpenStack Object Storage (swift)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-04: Fix merged to swift (master)

#1

Reviewed: https://review.openstack.org/138524
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=d40cebfe55a2ef63e0fdceb21548587bad497a69
Submitter: Jenkins
Branch: master

commit d40cebfe55a2ef63e0fdceb21548587bad497a69
Author: Caleb Tennis <email address hidden>
Date: Tue Dec 2 15:28:26 2014 -0500

Clean up empty account and container partitions directories.

    Because we iterate over these directories on a replication run,
    and they are not (previously) cleaned up, the time to start the
    replication increases incrementally for each stale directory
    lying around. Thousands of directories across dozens of disks
    on a single machine can make for non-trivial startup times.

Plus it just seems like good housekeeping.
Closes-Bug: #1396152

Change-Id: Iab607b03b7f011e87b799d1f9af7ab3b4ff30019

Changed in swift:
status:	New → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-04: Fix proposed to swift (stable/juno)

#2

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/139255

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-07: Fix proposed to swift (feature/ec)

#3

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/139870

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-07: Fix merged to swift (feature/ec)

#4

Download full text (7.0 KiB)

Reviewed: https://review.openstack.org/139870
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=132f8b3169cd0b5ba094736b16fbc75ccc11551e
Submitter: Jenkins
Branch: feature/ec

commit cc2f0f4ed6f12554b7d8e8cb61e14f2b103445a0
Author: Samuel Merritt <email address hidden>
Date: Thu Dec 4 18:37:24 2014 -0800

Speed up reading and writing xattrs for object metadata

    Object metadata is stored as a pickled hash: first the data is
    pickled, then split into strings of length <= 254, then stored in a
    series of extended attributes named "user.swift.metadata",
    "user.swift.metadata1", "user.swift.metadata2", and so forth.

    The choice of length 254 is odd, undocumented, and dates back to the
    initial commit of Swift. From talking to people, I believe this was an
    attempt to fit the first xattr in the inode, thus avoiding a
    seek. However, it doesn't work. XFS _either_ stores all the xattrs
    together in the inode (local), _or_ it spills them all to blocks
    located outside the inode (extents or btree). Using short xattrs
    actually hurts us here; by splitting into more pieces, we end up with
    more names to store, thus reducing the metadata size that'll fit in
    the inode.

[Source: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Extended_Attributes.html]

I did some benchmarking of read_metadata with various xattr sizes
against an XFS filesystem on a spinning disk, no VMs involved.

Summary:

     name | rank | runs | mean | sd | timesBaseline
    ------|------|------|-----------|-----------|--------------
    32768 | 1 | 2500 | 0.0001195 | 3.75e-05 | 1.0
    16384 | 2 | 2500 | 0.0001348 | 1.869e-05 | 1.12809122912
     8192 | 3 | 2500 | 0.0001604 | 2.708e-05 | 1.34210998858
     4096 | 4 | 2500 | 0.0002326 | 0.0004816 | 1.94623473988
     2048 | 5 | 2500 | 0.0003414 | 0.0001409 | 2.85674781189
     1024 | 6 | 2500 | 0.0005457 | 0.0001741 | 4.56648611635
      254 | 7 | 2500 | 0.001848 | 0.001663 | 15.4616067887

    Here, "name" is the chunk size for the pickled metadata. A total
    metadata size of around 31.5 KiB was used, so the "32768" runs
    represent storing everything in one single xattr, while the "254" runs
    represent things as they are without this change.

Since bigger xattr chunks make things go faster, the new chunk size is
64 KiB. That's the biggest xattr that XFS allows.

Reading of metadata from existing files is unaffected; the
read_metadata() function already handles xattrs of any size.

On non-XFS filesystems, this is no worse than what came before:

    ext4 has a limit of one block (typically 4 KiB) for all xattrs (names
    and values) taken together [1], so this change slightly increases the
    amount of Swift metadata that can be stored on ext4.

ZFS let me store an xattr with an 8 MiB value, so that's plenty. It'll
probably go further, but I stopped there.

[1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Extended_Attributes

Change-Id: Ie22db08ac0050eda693de4c30d4bc0d...

Reviewed:  https://review.openstack.org/139870
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=132f8b3169cd0b5ba094736b16fbc75ccc11551e
Submitter: Jenkins
Branch:    feature/ec

commit cc2f0f4ed6f12554b7d8e8cb61e14f2b103445a0
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Dec 4 18:37:24 2014 -0800

Speed up reading and writing xattrs for object metadata
    
    Object metadata is stored as a pickled hash: first the data is
    pickled, then split into strings of length <= 254, then stored in a
    series of extended attributes named "user.swift.metadata",
    "user.swift.metadata1", "user.swift.metadata2", and so forth.
    
    The choice of length 254 is odd, undocumented, and dates back to the
    initial commit of Swift. From talking to people, I believe this was an
    attempt to fit the first xattr in the inode, thus avoiding a
    seek. However, it doesn't work. XFS _either_ stores all the xattrs
    together in the inode (local), _or_ it spills them all to blocks
    located outside the inode (extents or btree). Using short xattrs
    actually hurts us here; by splitting into more pieces, we end up with
    more names to store, thus reducing the metadata size that'll fit in
    the inode.
    
    [Source: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Extended_Attributes.html]
    
    I did some benchmarking of read_metadata with various xattr sizes
    against an XFS filesystem on a spinning disk, no VMs involved.
    
    Summary:
    
     name | rank | runs |      mean |        sd | timesBaseline
    ------|------|------|-----------|-----------|--------------
    32768 |    1 | 2500 | 0.0001195 |  3.75e-05 |           1.0
    16384 |    2 | 2500 | 0.0001348 | 1.869e-05 | 1.12809122912
     8192 |    3 | 2500 | 0.0001604 | 2.708e-05 | 1.34210998858
     4096 |    4 | 2500 | 0.0002326 | 0.0004816 | 1.94623473988
     2048 |    5 | 2500 | 0.0003414 | 0.0001409 | 2.85674781189
     1024 |    6 | 2500 | 0.0005457 | 0.0001741 | 4.56648611635
      254 |    7 | 2500 |  0.001848 |  0.001663 | 15.4616067887
    
    Here, "name" is the chunk size for the pickled metadata. A total
    metadata size of around 31.5 KiB was used, so the "32768" runs
    represent storing everything in one single xattr, while the "254" runs
    represent things as they are without this change.
    
    Since bigger xattr chunks make things go faster, the new chunk size is
    64 KiB. That's the biggest xattr that XFS allows.
    
    Reading of metadata from existing files is unaffected; the
    read_metadata() function already handles xattrs of any size.
    
    On non-XFS filesystems, this is no worse than what came before:
    
    ext4 has a limit of one block (typically 4 KiB) for all xattrs (names
    and values) taken together [1], so this change slightly increases the
    amount of Swift metadata that can be stored on ext4.
    
    ZFS let me store an xattr with an 8 MiB value, so that's plenty. It'll
    probably go further, but I stopped there.
    
    [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Extended_Attributes
    
    Change-Id: Ie22db08ac0050eda693de4c30d4bc0d620e7f7d4

commit d742b610df8b9edac97506f165993267db37af3e
Author: Hisashi Osanai <osanai.hisashi@jp.fujitsu.com>
Date:   Fri Dec 5 23:43:52 2014 +0900

Fix the behavior of swift-ring-builder list_parts before rebalance
    
    The swift-ring-builder list_parts before rebalance failed abnormally so
    this patch fix the behavior. After this patch applies the behavior is
    completion normally with the following messages.
    
    Specified builder file "<builder_file>" is not rebalanced yet.
    Please rebalance first.
    
    Closes-Bug: #1399529
    Change-Id: I9e5db6da85de4188915c51bc401604733f0e1b77

commit a3b192614dbff709237b6269932980e4f04017a2
Author: Jeremy Stanley <fungi@yuggoth.org>
Date:   Fri Dec 5 03:30:40 2014 +0000

Workflow documentation is now in infra-manual
    
    Replace URLs for workflow documentation to appropriate parts of the
    OpenStack Project Infrastructure Manual.
    
    Change-Id: I060e5f6869fd302a47a54556f31763b5ab668012

commit c8ef11d677ef2403f8cc084f6893815f96985c68
Author: John Dickinson <me@not.mn>
Date:   Tue Dec 2 14:55:32 2014 -0800

added testing notes to the contributing doc
    
    Change-Id: Ifb83469dabbca435bd3df2c05089dc1a113c4460

commit 233e0aebf77022074f328d26cb54490115e2127b
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Wed Dec 3 16:54:43 2014 -0800

Fix reclaim on deleted containers
    
    The common db replicator's code path for reclaiming deleted db's beyond the
    reclaim age was not covered by unittests, and a AttributeError snuck in.  In
    writing the test that would cover the common code both for accounts and
    containers I discovered another KeyError with the container conditional for
    validating the container's fully reported status.
    
    This fixes both those issues and adds additional tests for the cleanup empty
    account container partition and suffix directories.
    
    Change-Id: I2a1bfaefebd05b01231bf71dd908fcc49adb4c36

commit dfd78437326a45755347b84558494dd80fb7b81c
Author: Thiago da Silva <thiago@redhat.com>
Date:   Tue Dec 2 16:57:07 2014 -0500

Removing unused method: _remaining_items
    
    Removing method _remaining_items from object controller class.
    
    The only caller to this function was removed as part of the
    work to move all DLO functionality to middleware:
    https://review.openstack.org/63326
    
    Change-Id: I7fbc208746bba8142ae51bf27cfa1979cae00301
    Signed-off-by: Thiago da Silva <thiago@redhat.com>

commit d40cebfe55a2ef63e0fdceb21548587bad497a69
Author: Caleb Tennis <caleb.tennis@gmail.com>
Date:   Tue Dec 2 15:28:26 2014 -0500

Clean up empty account and container partitions directories.
    
    Because we iterate over these directories on a replication run,
    and they are not (previously) cleaned up, the time to start the
    replication increases incrementally for each stale directory
    lying around.  Thousands of directories across dozens of disks
    on a single machine can make for non-trivial startup times.
    
    Plus it just seems like good housekeeping.
    Closes-Bug: #1396152
    
    Change-Id: Iab607b03b7f011e87b799d1f9af7ab3b4ff30019

commit 83834eb2769a6dd4193e514007abcdbfa4241001
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Mon Dec 1 11:44:10 2014 -0800

Raise ValueError for offset on Timestamp over limit
    
    We can't order a Timestamp with an offset larger than 16 hex digits
    correctly, so we raise a ValueError if you try to create one.
    
    Change-Id: I8c8d4cf13785a1a8eb7416392263eae5242aa407

commit c52795e263ab40d29baf1177812a5c93d15fbb26
Author: Thiago da Silva <thiago@redhat.com>
Date:   Mon Nov 24 16:44:42 2014 -0500

Fix container quota check during cross-account COPY
    
    Container quota is not currently checking Destination-Account header
    which could cause quota to not be enforced in case of copies
    
    Change-Id: I43adb0d7d2fc14ba6c0ca419a52a5c3f138f799a
    Signed-off-by: Thiago da Silva <thiago@redhat.com>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-13: Fix merged to swift (stable/juno)

#5

Reviewed: https://review.openstack.org/139255
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=70e35c6084ddb62a6124cdd5ba35f29748dfd746
Submitter: Jenkins
Branch: stable/juno

commit 70e35c6084ddb62a6124cdd5ba35f29748dfd746
Author: Caleb Tennis <email address hidden>
Date: Tue Dec 2 15:28:26 2014 -0500

Clean up empty account and container partitions directories.

    Because we iterate over these directories on a replication run,
    and they are not (previously) cleaned up, the time to start the
    replication increases incrementally for each stale directory
    lying around. Thousands of directories across dozens of disks
    on a single machine can make for non-trivial startup times.

Plus it just seems like good housekeeping.
Closes-Bug: #1396152

Change-Id: Iab607b03b7f011e87b799d1f9af7ab3b4ff30019

tags:

added: in-stable-juno

Thierry Carrez (ttx) on 2014-12-15

Changed in swift:
milestone:	none → 2.2.1
status:	Fix Committed → Fix Released

Revision history for this message

clayg (clay-gerrard) wrote on 2018-01-08:

#6

This issue was finally addressed/closed when we fixed https://bugs.launchpad.net/swift/+bug/1583719

OpenStack Object Storage (swift)

container partitions directories never removed

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches